Date of Award
Fall 10-2025
Document Type
Dissertation
Degree Name
Doctor of Philosophy in Cyber Operations (PhDCO)
First Advisor
Varghese Mathew Vaidyan
Second Advisor
Andrew Kramer
Third Advisor
Gurcan Comert
Abstract
Microsoft office document malwares are prevalent today even though some of the macros were developed 30 years ago. The office macros are used as droppers or downloaders for emerging malwares. Though various models have been developed to detect the office document macros, the interpretability and attribution of the model results are not gained attention. Also, even though the models provide high accuracy, the probabilistic uncertainty the models are not addressed either.
This research provides a novel method to classify malicious office document macros measuring interpretability, attribution and probabilistic uncertainty. Our approach combines the function semantics and keyword contexts to leverage the self attention functionality of transformers. We compare three variants of the Bidirectional Encoder Representations from Transformers (BERT) model namely BERT, DistilBERT and CodeBERT and compare the accuracy, interpretability and uncertainty of transformer models in detecting office document macros. The model is evaluated on dataset collected using Common Crawl. The tokens that attribute positively and negatively to the classification result are visualized with color codes. The probabilistic uncertainty is computed using Bayesian approximation using Monte Carlo (MC) Dropout, which provides a computationally efficient solution to measure uncertainty.
We introduce a new term Confidence Adjusted Accuracy (CAA), as a measure of accuracy and probabilistic uncertainty together. We propose CAA as a novel technique to measure and compare the accuracy normalized with uncertainty. Results demonstrate that Confidence Adjusted Accuracy can accurately measure the impact of uncertainty arises from inferring unseen/out of domain samples. This technique helps cyber analysts to gain transparency in the model behaviors. It will also help reduce false positives and mitigate bias.
Recommended Citation
Kalappattil, Mahesh, "Detection of Malicious Office Document Macros Using Large Language Models" (2025). Masters Theses & Doctoral Dissertations. 503.
https://scholar.dsu.edu/theses/503