Date of Award

Fall 11-22-2016

Document Type

Dissertation

Degree Name

Doctor of Science in Information Systems

Department

Business and Information Systems

First Advisor

Yong Wang

Second Advisor

Jun Liu

Third Advisor

Mark Hawkes

Fourth Advisor

Surendra Sanikar

Abstract

Online social networks (OSNs) are platforms to connect and communicate with friends, families, and like-minded people. Users post thoughts, comment to other’s posts, share photos and videos, and share information. The shared information often includes URLs (Uniform Resource Locators), which direct users to web content like news, articles, and advertisements. URL sharing is very popular on online social networks. However, URL sharing is not always convenient because of overly long and complicated URL strings. Thus, short URLs have become very popular on OSNs because of their simplicity. However, many risks have been found and reported in association with sharing short URLs. Malicious users utilize short URLs heavily in their sinister campaigns such as phishing, malware, spams, and scams. It is highly desirable to design and develop an effective short URL classifier to mitigate these threats on online social networks. In this dissertation, we develop a short URL classifier, CONSOL, using the features collected from online social networks. We achieve an accuracy of 94.5% in identifying malicious short URLs using Random Forest machine learning algorithm. Unlike most existing techniques which depend on third party resources to classify URLs, our classifier does not depend on any third party service providers during its operation and leverages features available on OSNs only. Our research identifies 16 features that are important for short URL classification. These 16 features are logically categorized into three categories, i.e., content features, context features, and social features. Further analysis reveals that social features contribute significantly towards classifying short URLs and context features are also good indicators of the malignity of short URLs. Compared to social features and content features, context features are less important. However, context features complement the classifier to be more effective. The comparisons of the CONSOL with the existing solution and Google Safe Browsing show that the classifier is promising in the real world too. CONSOL is slightly better than the existing solution. However, unlike the existing solutions relying on third party information, CONSOL runs on its own. Our testing also indicates that CONSOL identifies malicious short URLs much faster than the Google Safe Browsing. The results are validated and supported by VirusTotal. Our case studies further demonstrate that other online social networks can also adopt CONSOL for short URL classification.

Share

COinS