Interpreting Office Document Macros with Bi-Directional Transformer Models

Outlet Title

2025 Cyber Awareness and Research Symposium (CARS)

Document Type

Conference Proceeding

Publication Date

2025

Abstract

Microsoft Office Document malware is prevalent today, even though some of the macros were developed 30 years ago. This paper provides a novel method to classify malicious office document macros with inter-pretability. Our approach combines the function semantics and keyword contexts to leverage the self-attention functionality of transformers. This research focuses on Bidirectional Encoder Representations from Transformers (BERT) model variants to evaluate and compare the accuracy and interpretability of transformer models in detecting office document macros. The model is evaluated on a dataset collected using Common Crawl. The results show that our method using BERT model variants provides more than 99% accuracy in detecting office document macros. Our research also shows that the BERT models can accurately attribute the classification outcome to the input tokens. Finally, we propose a novel solution to scan email attachments for malicious office document macros and provide attribution reports which not only labels the email as malicious but also attributes as to which tokens in the document are contributing positively towards the classification. This solution is integrated with Gmail as a workspace add-on. We hope that such solutions improve the trust of cyber security personnel in the model and threat detection mechanisms and fine-tune the model to eliminate false positives and biases.

Share

COinS