Mid-Generation Jailbreaks in Open-Source LLMs Using a Pause-and-Edit Attack

Download

Download Full Text (497 KB)

Description

Large Language Models (LLMs) are widely used in AI assistants, chatbots, and decision-support systems. To prevent harmful responses, most LLMs rely on safety alignment mechanisms that generate refusal responses when users request unsafe content. However, most safety evaluations assume that alignment is only required at the start of generation. In this research, we investigate a mid-generation jailbreak attack called Pause-and-Edit, where a refusal response is interrupted, modified, and resumed. This manipulation can cause the model to override its original safety decision and generate harmful instructions. Our study evaluates how vulnerable modern open-source LLMs are to this type of attack.

Publication Date

2026

Recommended Citation

Singh, Aman; More, Komal; Aryal, Samyam; and Spanier, Mark, "Mid-Generation Jailbreaks in Open-Source LLMs Using a Pause-and-Edit Attack" (2026). Annual Research Symposium. 83.
https://scholar.dsu.edu/research-symposium/83

Annual Research Symposium

Mid-Generation Jailbreaks in Open-Source LLMs Using a Pause-and-Edit Attack

Description

Publication Date

Recommended Citation

Browse

Search

Author Corner

Annual Research Symposium

Mid-Generation Jailbreaks in Open-Source LLMs Using a Pause-and-Edit Attack

Authors

Files

Description

Publication Date

Recommended Citation

Share

Browse

Search

Author Corner