Author

Anthony Rizi

Date of Award

Spring 3-20-2024

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Cyber Operations (PhDCO)

First Advisor

Varghese Vaidyan

Second Advisor

Tom Halverson

Third Advisor

Mark Spanier

Fourth Advisor

Cory Singleton

Abstract

This comprehensive research endeavors to address a pressing issue within the realm of cybersecurity—the challenge posed by malicious activities utilizing Domain Generation Algorithms (DGAs). These algorithms, numbering at least 84 traditional malware families as of late 2023, dynamically generate domain names to facilitate nefarious operations while evading conventional detection mechanisms. The study generated over 159,750 domains and studied more than 1.27 million data points. Existing studies have predominantly focused on surface-level aspects of DGAs, including domain lengths, alphanumeric values, and top-level domains (TLDs). In response to this challenge, the research question at the core of this study aims to investigate whether sophisticated classifiers can effectively detect and classify DGA-enabled malware by discerning variations in DGAs, including original DGAs, those modified with injected noise, and a novel approach of modification through Linear Recursive Sequences (LRS). The chosen methodology for this research adopts a quantitative design, utilizing Python programming and a suite of libraries for efficient data manipulation, machine learning, and visualization. The focal point of the methodology involves training a Feedforward Neural Network (FNN) using a meticulously curated dataset comprising both original DGAs and their modified counterparts. To facilitate effective classification, the dataset undergoes a detailed segmentation into categories. The FNN architecture, with specific hyperparameters, employs the Adam optimizer, sigmoid activation functions, and three dense layers. Eight features, including Damerau Levenshtein Distance and String Entropy, and others to contribute to the FNN’s understanding of the input data. The research explores the intricacies of neural network comprehension, dataset classification, and feature identification, overcoming these challenges through extensive multiclassification learning processes. The training configuration involves a Learning Rate of 0.0001, 50 epochs, a batch size of 32, and a 80/20% validation split. Rigorous feature selection and engineering, model selection, and hyperparameter adjustment are integral to the methodology. The study reviews five primary DGA datasets Banjori, Dnschanger, Dyre, Gameover, Murofetweekly, presenting detailed insights into their characteristics. The analysis reveals the challenges posed by DGA families with insufficient sample sizes, necessitating a strategic selection process. The FNN’s performance is explicitly evaluated on its ability to classify instances into original DGAs, Noise-Modified DGAs, and LRS-Modified DGAs. In conclusion, this research contributes significantly to cybersecurity by offering a sophisticated approach to DGA detection. The methodology’s robustness is examined through potential challenges, and recommendations for addressing these challenges are provided. This research demonstrates the effectiveness of the FNN in identifying an average of 99.5 percent of noise and LRS DGA modifications. This significant contribution enhances cybersecurity by introducing a sophisticated approach to DGA detection. It underscores the significance of staying abreast of evolving cyber threats and emphasizes the need for proactive cybersecurity measures in the face of continually adapting tactics employed by malicious actors.

Share

COinS