Date of Award

Spring 6-2025

Document Type

Thesis

Degree Name

Master of Science in Computer Science (MSCS)

First Advisor

Austin O’Brien

Second Advisor

John Hastings

Abstract

As large language models are increasingly deployed in real world systems, concerns about their security have grown. While these models demonstrate impressive fluency and utility, they are also susceptible to a variety of attacks, including prompt injection, data leakage, and adversarial manipulation. These vulnerabilities pose serious risks in high stakes environments where privacy, reliability, and trust are essential. Despite emerging efforts to document and mitigate such issues, a gap remains in the ability to systematically measure the effectiveness of proposed defenses in a quantifiable and reproducible manner.

This research addresses the question: how can security vulnerabilities in language model based systems be modeled, measured, and mitigated using a quantitative risk framework? To explore this, a retrieval augmented generation application was constructed using the DeepSeek-R1 language model, FastAPI, LangChain, and FAISS. The application was tested under baseline conditions and then with three security interventions: attribute based access control, named entity recognition redaction using Microsoft Presidio, and NeMo Guardrails for response filtering.

Five vulnerability types were tested: exposure of personally identifiable information, latent context injection, adversarial prompt generation, direct prompt injection, and divergence. A probe based tool was used to simulate adversarial attacks and measure susceptibility. Attack success probabilities were estimated using Laplace’s Rule of Succession, and expected losses were modeled using triangle distributions informed by breach cost data. Monte Carlo simulations were used to project cumulative losses under each configuration, and loss exceedance curves were generated to visualize long tail risk.

The results showed that two of three controls led to measurable improvements in system security. The third control did not provide significant protection out of the box. The access control mechanism effectively reduced attack likelihood but did so by disabling document retrieval entirely. The named entity recognition filter allowed continued use of retrieval augmented generation while reducing exposure to sensitive data. NeMo Guardrails provided no robust post generation filtering out of the box, producing an expected loss comparable to the baseline and the least return on control. This demonstrates the value of layered defenses that minimize both the probability and impact of successful attacks.

This thesis contributes a reproducible methodology for evaluating the security of large language models using probabilistic models, financial risk simulations, and comparative cost benefit analysis. It provides practitioners with tools to assess the effectiveness of different controls and highlights the importance of structured, data driven approaches to securing artificial intelligence systems.

Share

COinS