Date of Award

Summer 2019

Document Type


Degree Name

Doctor of Philosophy (PhD)


Computer Science

First Advisor

Dr. Omar El-Gayar

Second Advisor

Dr. Yong Wang

Third Advisor

Dr. Cheri Noteboom


Social media is considered as a promising and viable source of data for gaining insights into various disease conditions, patients’ attitudes and behaviors, and medications. The daily use of social media provides new opportunities for analyzing several aspects of communication. Social media as a big data source can be used to recognize communication and behavioral themes of problematic use of prescription drugs. Mining and analyzing such media have challenges and limitations with respect to topic deduction and data quality. There is a need for a structured approach to efficiently and effectively analyze social media content related to drug abuse in a manner that can mitigate the challenges surrounding the use of this data source.

Following a design science research methodology, the research aims at developing and evaluating a framework for mining and analyzing social media content related to drug abuse in a manner that will mitigate challenges and limitations related to topic deduction and data quality. The framework consists of four phases: Topic Discovery and Detection; Data Collection; Data Preparation and Quality; and Analysis and Results.

The topic discovery and detection phase consists of a topic expansion stage for the drug abuse related topics that address the research domain and objectives. The topic expansion is based on different terms related to keywords, categories, and characteristics of the topic of interest and the objective of monitoring. To formalize the process and supporting artifacts, we create an ontology for drug abuse that captures the different categories that exist in the topic expansion and the literature. The data collection phase is characterized by the date range, social media platforms, search keywords, and a set of inclusion/exclusion criteria. The data preparation and quality phase is mainly concerned with obtaining high-quality data to mitigate problems with data veracity. In this phase, we pre-process the collected data then we evaluate the quality of the data, with respect to the terms and objectives of the research topic phase, using a data quality evaluation matrix. Finally, in the data analysis phase, the researcher can choose the suitable analysis approach. We used a combination of unsupervised and supervised machine learning approaches, including opinion and content analysis modeling.

We demonstrate and evaluate the applicability of the proposed framework to identify common concerns toward opioid crisis from two perspectives; the addicted users’ perspective and the public’s (non-addicted users) perspective. In both cases, data is collected from twitter using Crimson Hexagon, a social media analytics tool for data collection and analysis. Natural language processing is used for data preparation and pre-processing. Different data visualization techniques such as, word clouds and clustering visualization, are used to form a deeper understanding of the relationships among the identified themes for the selected communities. The results help in understanding concerns of the public and opioid addicts towards the opioid crisis in the United States. Results of this study could help in understanding the problem aspects and provide key input when it comes to defining and implementing innovative solutions/strategies to face the opioid epidemic.

From a theoretical perspective, this study highlights the importance of developing and adapting text mining techniques to social media for drug abuse. This study proposes a social media text mining framework for drug abuse research which lead to a good quality of datasets. Emphasis is placed on developing methods for improving the discovery and identification of topics in social media domains characterized by a plethora of highly diverse terms and a lack of commonly available dictionary/language by the community such as in the opioid and drug abuse case. From a practical perspective, automatically analyzing social media users’ posts using machine learning tools can help in understanding the public themes and topics that exist in the recent discussions of online users of social media networks. This could help in developing proper mitigation strategies. Examples of such strategies can be gaining insights from the discussion topics to make the opioid media campaigns more effective in preventing opioid misuse. Finally, the study helps address some of the U.S. Department of Health and Human Services (HHS) five-point strategy by providing a systematic approach that could support conducting better research on addiction and drug abuse and strengthening public health data reporting and collection using social media data.