Interpreting Office Document Macros with Bi-Directional Transformer Models
Outlet Title
2025 Cyber Awareness and Research Symposium (CARS)
Document Type
Conference Proceeding
Publication Date
2025
Abstract
Microsoft Office Document malware is prevalent today, even though some of the macros were developed 30 years ago. This paper provides a novel method to classify malicious office document macros with inter-pretability. Our approach combines the function semantics and keyword contexts to leverage the self-attention functionality of transformers. This research focuses on Bidirectional Encoder Representations from Transformers (BERT) model variants to evaluate and compare the accuracy and interpretability of transformer models in detecting office document macros. The model is evaluated on a dataset collected using Common Crawl. The results show that our method using BERT model variants provides more than 99% accuracy in detecting office document macros. Our research also shows that the BERT models can accurately attribute the classification outcome to the input tokens. Finally, we propose a novel solution to scan email attachments for malicious office document macros and provide attribution reports which not only labels the email as malicious but also attributes as to which tokens in the document are contributing positively towards the classification. This solution is integrated with Gmail as a workspace add-on. We hope that such solutions improve the trust of cyber security personnel in the model and threat detection mechanisms and fine-tune the model to eliminate false positives and biases.
Recommended Citation
Kalappattil, Mahesh; Vaidyan, Varghese; Comert, Gurcan; and Wang, Yong, "Interpreting Office Document Macros with Bi-Directional Transformer Models" (2025). Research & Publications. 145.
https://scholar.dsu.edu/ccspapers/145