Date of Award

Fall 12-2017

Document Type


Degree Name

Doctor of Science in Information Systems


Business and Information Systems

First Advisor

Jun Liu

Second Advisor

Ronghua Shan

Third Advisor

Deb Tech

Fourth Advisor

Shuyuan Deng


The massive social media data presents businesses with an immense opportunity to extract useful insights. However, social media messages typically consist of both facts and opinions, posing a challenge to analytics applications that focus more on either facts or opinions. Distinguishing facts and opinions from social media may significantly improve both, fact seeking applications that aims to capture breaking news, as well as user opinion seeking applications that aims to evaluate users' sentiment towards an event or entity. Despite, the growing need, classifying facts from opinion in social media, has gained minimal attention.

In this study we examine the limitation of applying existing, subjectivity detection methods that identifies subjective contents in textual data. In the context of social media, specifically in microblogs like Twitter, the content is dirty with respect to spelling, syntax, extensive usage of emoticons and abbreviation apart from the overall issue of data sparsity. Traditional methods of checking individual words against a predefined lexicon data set, do not often yield required accuracy for this task. Primary objective of this study is to address this limitation and provide an alternative method to improve this classification task and opinion mining in general.

The study proposes usmg supplemental information from Twitter metadata and empirically demonstrates the improvement in performance. To ensure rigor and relevance, design science research methodology is adopted for this project. We propose a deep learning algorithm that automatically separates facts from opinions in Twitter messages. Our model combines bag-of-word features with selected manually-engineered features from Twitter metadata in a multipm1 experiment. We leverage an external reference dataset to develop our manually-engineered feature variables and evaluated efficiency against three external baseline tools. The study uses eight different machine learning classifiers to demonstrate the robustness of the manual feature set. Next, we combine these manually-engineered features with features extracted from bag-of-words model in our proposed deep learning model. Our algorithm significantly outperformed multiple popular baselines in the internal evaluation pm1 of the experiment.

Next as part of practical usefulness, we illustrated how distinguishing facts and opinions

can be useful in a real world business application. We applied our proposed algorithm to an external opinion mining application that tracks emerging customer complaints from social media conversation. We conducted our case study with three large financial institutions using Twitter data for a period of 16 weeks. The study observed considerable improvement in that external application after integrating our algorithm and concludes that it indeed benefit subsequent analytics applications.