Date of Award

Spring 3-2019

Document Type


Degree Name

Doctor of Philosophy (PhD)

First Advisor

Dr. Cherie Noteboom

Second Advisor

Dr. Omar El-Gayar

Third Advisor

Dr. Jun Liu


Cyber systems are ubiquitous in all aspects of society. At the same time, breaches to cyber systems continue to be front-page news (Calfas, 2018; Equifax, 2017) and, despite more than a decade of heightened focus on cybersecurity, the threat continues to evolve and grow, costing globally up to $575 billion annually (Center for Strategic and International Studies, 2014; Gosler & Von Thaer, 2013; Microsoft, 2016; Verizon, 2017). To address possible impacts due to cyber threats, information system (IS) stakeholders must assess the risks they face. Following a risk assessment, the next step is to determine mitigations to counter the threats that pose unacceptably high risks. The literature contains a robust collection of studies on optimizing mitigation selections, but they universally assume that the starting list of appropriate mitigations for specific threats exists from which to down-select. In current practice, producing this starting list is largely a manual process and it is challenging because it requires detailed cybersecurity knowledge from highly decentralized sources, is often deeply technical in nature, and is primarily described in textual form, leading to dependence on human experts to interpret the knowledge for each specific context. At the same time cybersecurity experts remain in short supply relative to the demand, while the delta between supply and demand continues to grow (Center for Cyber Safety and Education, 2017; Kauflin, 2017; Libicki, Senty, & Pollak, 2014). Thus, an approach is needed to help cybersecurity experts (CSE) cut through the volume of available mitigations to select those which are potentially viable to offset specific threats.

This dissertation explores the application of machine learning and text retrieval techniques to automate matching of relevant mitigations to cyber threats, where both are expressed as unstructured or semi-structured English language text. Using the Design Science Research Methodology (Hevner & March, 2004; Peffers, Tuunanen, Rothenberger, & Chatterjee, 2007), we consider a number of possible designs for the matcher, ultimately selecting a supervised machine learning approach that combines two techniques: support vector machine classification and latent semantic analysis. The selected approach demonstrates high recall for mitigation documents in the relevant class, bolstering confidence that potentially viable mitigations will not be overlooked. It also has a strong ability to discern documents in the non-relevant class, allowing approximately 97% of non-relevant mitigations to be excluded automatically, greatly reducing the CSE’s workload over purely manual matching. A false v positive rate of up to 3% prevents totally automated mitigation selection and requires the CSE to reject a few false positives.

This research contributes to theory a method for automatically mapping mitigations to threats when both are expressed as English language text documents. This artifact represents a novel machine learning approach to threat-mitigation mapping. The research also contributes an instantiation of the artifact for demonstration and evaluation. From a practical perspective the artifact benefits all threat-informed cyber risk assessment approaches, whether formal or ad hoc, by aiding decision-making for cybersecurity experts whose job it is to mitigate the identified cyber threats. In addition, an automated approach makes mitigation selection more repeatable, facilitates knowledge reuse, extends the reach of cybersecurity experts, and is extensible to accommodate the continued evolution of both cyber threats and mitigations. Moreover, the selection of mitigations applicable to each threat can serve as inputs into multifactor analyses of alternatives, both automated and manual, thereby bridging the gap between cyber risk assessment and final mitigation selection.