Research & Publications

TEDVIL: Leveraging Transformer-Based Embeddings for Vulnerability Detection in Lifted Code

Outlet Title

IEEE Access

Gary McCully, Dakota State UniversityFollow
John Hastings, Dakota State UniversityFollow
Shengjie Xu, University of ArizonaFollow

Document Type

Article

Publication Date

2025

Abstract

Ransomware and other malware inflict devastating financial and operational damage on organizations worldwide by exploiting deeply embedded, hard-to-detect vulnerabilities in their systems. Detecting these vulnerabilities in compiled code before malicious actors exploit them remains a critical challenge in cybersecurity. This research introduces TEDVIL (Transformer-based Embeddings for Discovering Vulnerabilities in Lifted Code), a novel framework which uses transformer-based embeddings to train neural networks to detect vulnerabilities in lifted code. The framework was implemented using bidirectional (BERT and RoBERTa) and unidirectional (GPT-1 and GPT-2) transformer-based models to generate embeddings for training Long Short-Term Memory (LSTM) neural networks to detect stack-based buffer overflows in Low-Level Virtual Machine (LLVM) intermediate representation code. For comparison, simpler word2vec models (Skip-Gram and Continuous Bag of Words) were also trained, and their embeddings were used to train LSTMs. The results show that the LSTMs using GPT-2 embeddings outperformed those using GPT-1, BERT, RoBERTa, and word2vec embeddings, achieving a top accuracy of 92.5% and an F1-score of 89.7%. Notably, these results are achieved when the embedding model is trained with a dataset of just 48,000 functions, demonstrating effectiveness in resource-constrained settings. The findings underscore the effectiveness of TEDVIL in identifying hard-to-detect vulnerabilities in compiled code, and lay the groundwork for future research in leveraging transformer-based models for vulnerability detection.

Recommended Citation

G. A. McCully, J. D. Hastings and S. Xu, "TEDVIL: Leveraging Transformer-Based Embeddings for Vulnerability Detection in Lifted Code," in IEEE Access, vol. 13, pp. 76894-76913, 2025, doi: 10.1109/ACCESS.2025.3565980.

Download

Included in

Artificial Intelligence and Robotics Commons, Cybersecurity Commons, Information Security Commons

COinS

Research & Publications

TEDVIL: Leveraging Transformer-Based Embeddings for Vulnerability Detection in Lifted Code

Outlet Title

Document Type

Publication Date

Abstract

Recommended Citation

Included in

Browse

Search

Author Corner

Links

Research & Publications

TEDVIL: Leveraging Transformer-Based Embeddings for Vulnerability Detection in Lifted Code

Outlet Title

Authors

Document Type

Publication Date

Abstract

Recommended Citation

Included in

Share

Browse

Search

Author Corner

Links