Date of Award
Spring 1-2025
Document Type
Dissertation
Degree Name
Doctor of Philosophy in Cyber Operations (PhDCO)
First Advisor
John Hastings
Second Advisor
Austin O'Brien
Third Advisor
Tyler Flaagan
Fourth Advisor
Varghese Vaidyan
Fifth Advisor
Shengjie Xu
Abstract
Ransomware and other malware inflict devastating financial and operational damage on organizations worldwide by exploiting deeply embedded, hard-to-detect vulnerabilities in their systems. Detecting these vulnerabilities in compiled code before malicious actors exploit them remains a critical challenge in cybersecurity. This research introduces TEDVIL (Transformer-based Embeddings for Discovering Vulnerabilities in Lifted Code), a novel framework which uses transformer-based embeddings to train neural networks to detect vulnerabilities in lifted code. The framework was implemented using bidirectional (BERT and RoBERTa) and unidirectional (GPT-1 and GPT-2) transformer-based models to generate embeddings for training Long Short-Term Memory (LSTM) neural networks to detect stack-based buffer overflows in LLVM intermediate representation code. For comparison, simpler word2vec models (Skip-Gram and Continuous Bag of Words) were also trained, and their embeddings were used to train LSTMs. The results show that the LSTMs using GPT-2 embeddings outperformed those using GPT-1, BERT, RoBERTa, and word2vec embeddings, achieving a top accuracy of 92.5% and an F1-score of 89.7%. Notably, these results are achieved when the embedding model is trained with a dataset of just 48,000 functions, demonstrating effectiveness in resource-constrained settings. The findings underscore the effectiveness of TEDVIL in identifying hard-to-detect vulnerabilities in compiled code, and lay the groundwork for future research in leveraging transformer-based models for vulnerability detection.
Recommended Citation
McCully, Gary, "A Deep Learning Approach to Vulnerability Detection in Lifted Code Using Transformer-Based Embeddings" (2025). Masters Theses & Doctoral Dissertations. 487.
https://scholar.dsu.edu/theses/487