Date of Award

Fall 10-2025

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Cyber Operations (PhDCO)

First Advisor

Varghese Mathew Vaidyan

Second Advisor

Andrew Kramer

Third Advisor

Gurcan Comert

Abstract

Microsoft office document malwares are prevalent today even though some of the macros were developed 30 years ago. The office macros are used as droppers or downloaders for emerging malwares. Though various models have been developed to detect the office document macros, the interpretability and attribution of the model results are not gained attention. Also, even though the models provide high accuracy, the probabilistic uncertainty the models are not addressed either.

This research provides a novel method to classify malicious office document macros measuring interpretability, attribution and probabilistic uncertainty. Our approach combines the function semantics and keyword contexts to leverage the self attention functionality of transformers. We compare three variants of the Bidirectional Encoder Representations from Transformers (BERT) model namely BERT, DistilBERT and CodeBERT and compare the accuracy, interpretability and uncertainty of transformer models in detecting office document macros. The model is evaluated on dataset collected using Common Crawl. The tokens that attribute positively and negatively to the classification result are visualized with color codes. The probabilistic uncertainty is computed using Bayesian approximation using Monte Carlo (MC) Dropout, which provides a computationally efficient solution to measure uncertainty.

We introduce a new term Confidence Adjusted Accuracy (CAA), as a measure of accuracy and probabilistic uncertainty together. We propose CAA as a novel technique to measure and compare the accuracy normalized with uncertainty. Results demonstrate that Confidence Adjusted Accuracy can accurately measure the impact of uncertainty arises from inferring unseen/out of domain samples. This technique helps cyber analysts to gain transparency in the model behaviors. It will also help reduce false positives and mitigate bias.

Share

COinS