Research & Publications

Comparing Unidirectional, Bidirectional, and Word2vec Models for Discovering Vulnerabilities in Compiled Lifted Code

Outlet Title

2025 IEEE 13th International Symposium on Digital Forensics and Security (ISDFS)

Gary McCully, Dakota State UniversityFollow
John Hastings, Dakota State UniversityFollow
Shengjie Xu, University of ArizonaFollow
Adam Fortier, Georgia Institute of Technology

Document Type

Conference Proceeding

Publication Date

2025

Abstract

Ransomware and other forms of malware cause significant financial and operational damage to organizations by exploiting long-standing and often difficult-to-detect software vulnerabilities. To detect vulnerabilities such as buffer overflows in compiled code, this research investigates the application of unidirectional transformer-based embeddings, specifically GPT-2. Using a dataset of LLVM functions, we trained a GPT-2 model to generate embeddings, which were subsequently used to build LSTM neural networks to differentiate between vulnerable and non-vulnerable code. Our study reveals that embeddings from the GPT-2 model significantly outperform those from bidirectional models of BERT and RoBERTa, achieving an accuracy of 92.5\% and an F1-score of 89.7\%. LSTM neural networks were developed with both frozen and unfrozen embedding model layers. The model with the highest performance was achieved when the embedding layers were unfrozen. Further, the research finds that, in exploring the impact of different optimizers within this domain, the SGD optimizer demonstrates superior performance over Adam. Overall, these findings reveal important insights into the potential of unidirectional transformer-based approaches in enhancing cybersecurity defenses.

Recommended Citation

McCully, Gary; Hastings, John; Xu, Shengjie; and Fortier, Adam, "Comparing Unidirectional, Bidirectional, and Word2vec Models for Discovering Vulnerabilities in Compiled Lifted Code" (2025). Research & Publications. 100.
https://scholar.dsu.edu/ccspapers/100

Download

Included in

Artificial Intelligence and Robotics Commons, Cybersecurity Commons

COinS

Research & Publications

Comparing Unidirectional, Bidirectional, and Word2vec Models for Discovering Vulnerabilities in Compiled Lifted Code

Outlet Title

Document Type

Publication Date

Abstract

Recommended Citation

Included in

Browse

Search

Author Corner

Links

Research & Publications

Comparing Unidirectional, Bidirectional, and Word2vec Models for Discovering Vulnerabilities in Compiled Lifted Code

Outlet Title

Authors

Document Type

Publication Date

Abstract

Recommended Citation

Included in

Share

Browse

Search

Author Corner

Links