Masters Theses & Doctoral Dissertations

A Deep Learning Approach to Vulnerability Detection in Lifted Code Using Transformer-Based Embeddings

Gary McCully

Date of Award

Spring 1-2025

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Cyber Operations (PhDCO)

First Advisor

John Hastings

Second Advisor

Austin O'Brien

Third Advisor

Tyler Flaagan

Fourth Advisor

Varghese Vaidyan

Fifth Advisor

Shengjie Xu

Abstract

Ransomware and other malware inflict devastating financial and operational damage on organizations worldwide by exploiting deeply embedded, hard-to-detect vulnerabilities in their systems. Detecting these vulnerabilities in compiled code before malicious actors exploit them remains a critical challenge in cybersecurity. This research introduces TEDVIL (Transformer-based Embeddings for Discovering Vulnerabilities in Lifted Code), a novel framework which uses transformer-based embeddings to train neural networks to detect vulnerabilities in lifted code. The framework was implemented using bidirectional (BERT and RoBERTa) and unidirectional (GPT-1 and GPT-2) transformer-based models to generate embeddings for training Long Short-Term Memory (LSTM) neural networks to detect stack-based buffer overflows in LLVM intermediate representation code. For comparison, simpler word2vec models (Skip-Gram and Continuous Bag of Words) were also trained, and their embeddings were used to train LSTMs. The results show that the LSTMs using GPT-2 embeddings outperformed those using GPT-1, BERT, RoBERTa, and word2vec embeddings, achieving a top accuracy of 92.5% and an F1-score of 89.7%. Notably, these results are achieved when the embedding model is trained with a dataset of just 48,000 functions, demonstrating effectiveness in resource-constrained settings. The findings underscore the effectiveness of TEDVIL in identifying hard-to-detect vulnerabilities in compiled code, and lay the groundwork for future research in leveraging transformer-based models for vulnerability detection.

Recommended Citation

McCully, Gary, "A Deep Learning Approach to Vulnerability Detection in Lifted Code Using Transformer-Based Embeddings" (2025). Masters Theses & Doctoral Dissertations. 487.
https://scholar.dsu.edu/theses/487

Download

COinS

Masters Theses & Doctoral Dissertations

A Deep Learning Approach to Vulnerability Detection in Lifted Code Using Transformer-Based Embeddings

Date of Award

Document Type

Degree Name

First Advisor

Second Advisor

Third Advisor

Fourth Advisor

Fifth Advisor

Abstract

Recommended Citation

Browse

Search

Author Corner

Masters Theses & Doctoral Dissertations

A Deep Learning Approach to Vulnerability Detection in Lifted Code Using Transformer-Based Embeddings

Author

Date of Award

Document Type

Degree Name

First Advisor

Second Advisor

Third Advisor

Fourth Advisor

Fifth Advisor

Abstract

Recommended Citation

Share

Browse

Search

Author Corner