Date of Award

Fall 10-2025

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Cyber Operations (PhDCO)

First Advisor

Varghese Mathew Vaidyan

Second Advisor

Andrew Kramer

Third Advisor

Gurcan Comert

Abstract

Microsoft office document malwares are prevalent today even though some of the macros were developed 30 years ago. The office macros are used as droppers or downloaders for emerging malwares. Though various models have been developed to detect the office document macros, the interpretability and attribution of the model results are not gained attention. Also, even though the models provide high accuracy, the probabilistic uncertainty the models are not addressed either.

This research provides a novel method to classify malicious office document macros measuring interpretability, attribution and probabilistic uncertainty. Our approach combines the function semantics and keyword contexts to leverage the self attention functionality of transformers. We compare three variants of the Bidirectional Encoder Representations from Transformers (BERT) model namely BERT, DistilBERT and CodeBERT and compare the accuracy, interpretability and uncertainty of transformer models in detecting office document macros. The model is evaluated on dataset collected using Common Crawl. The tokens that attribute positively and negatively to the classification result are visualized with color codes. The probabilistic uncertainty is computed using Bayesian approximation using Monte Carlo (MC) Dropout, which provides a computationally efficient solution to measure uncertainty.

We introduce a new term Confidence Adjusted Accuracy (CAA), as a measure of accuracy and probabilistic uncertainty together. We propose CAA as a novel technique to measure and compare the accuracy normalized with uncertainty. Results demonstrate that Confidence Adjusted Accuracy can accurately measure the impact of uncertainty arises from inferring unseen/out of domain samples. This technique helps cyber analysts to gain transparency in the model behaviors. It will also help reduce false positives and mitigate bias.

Recommended Citation

Kalappattil, Mahesh, "Detection of Malicious Office Document Macros Using Large Language Models" (2025). Masters Theses & Doctoral Dissertations. 503.
https://scholar.dsu.edu/theses/503

Download

COinS

Masters Theses & Doctoral Dissertations

Detection of Malicious Office Document Macros Using Large Language Models

Date of Award

Document Type

Degree Name

First Advisor

Second Advisor

Third Advisor

Abstract

Recommended Citation

Browse

Search

Author Corner

Masters Theses & Doctoral Dissertations

Detection of Malicious Office Document Macros Using Large Language Models

Author

Date of Award

Document Type

Degree Name

First Advisor

Second Advisor

Third Advisor

Abstract

Recommended Citation

Share

Browse

Search

Author Corner