Date of Award

Spring 5-2021

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Cyber Operations (PhDCO)

Department

Computer Science

First Advisor

Kyle Cronin

Second Advisor

Austin O'Brien

Third Advisor

Christopher Olson

Abstract

Network Intrusion Detection System (IDS) devices play a crucial role in the realm of network security. These systems generate alerts for security analysts by performing signature-based and anomaly-based detection on malicious network traffic. However, there are several challenges when configuring and fine-tuning these IDS devices for high accuracy and precision. Machine learning utilizes a variety of algorithms and unique dataset input to generate models for effective classification. These machine learning techniques can be applied to IDS devices to classify and filter anomalous network traffic. This combination of machine learning and network security provides improved automated network defense by developing highly-optimized IDS models that utilize unique algorithms for enhanced intrusion detection. Machine learning models can be trained using a combination of machine learning algorithms, network intrusion datasets, and optimization techniques. This study sought to identify which variation of these parameters yielded the best-performing network intrusion detection models, measured by their accuracy, precision, recall, and F1 score metrics. Additionally, this research aimed to validate theoretical models’ metrics by applying them in a real-world environment to see if they perform as expected. This research utilized a quantitative experimental study design to organize a two-phase approach to train and test a series of machine learning models for network intrusion detection by utilizing Python scripting, the scikit-learn library, and Zeek IDS software. The first phase involved optimizing and training 105 machine learning models by testing a combination of seven machine learning algorithms, five network intrusion datasets, and three optimization methods. These 105 models were then fed into the second phase, where the models were applied in a machine learning IDS pipeline to observe how the models performed in an implemented environment. The results of this study identify which algorithms, datasets, and optimization methods generate the best-performing models for network intrusion detection. This research also showcases the need to utilize various algorithms and datasets since no individual algorithm or dataset consistently achieved high metric scores independent of other training variables. Additionally, this research also indicates that optimization during model development is highly recommended; however, there may not be a need to test for multiple optimization methods since they did not typically impact the yielded models’ overall categorization of v success or failure. Lastly, this study’s results strongly indicate that theoretical machine learning models will most likely perform significantly worse when applied in an implemented IDS ML pipeline environment. This study can be utilized by other industry professionals and research academics in the fields of information security and machine learning to generate better highly-optimized models for their work environments or experimental research.

Share

COinS