“A Survey on Prompt Injection and Jailbreak Defenses in Large Language Models”

Anushree K N; Vedanth M; Amulya H; Chirayu Gowda; Meghashree C

doi:10.17148/IJARCCE.2026.155228

← Back to VOLUME 15, ISSUE 5, MAY 2026

“A Survey on Prompt Injection and Jailbreak Defenses in Large Language Models”

Anushree K N, Vedanth M, Amulya H, Chirayu Gowda, Prof. Meghashree C

Downloads: Download PDF|DOI: 10.17148/IJARCCE.2026.155228

👁 46 views📥 8 downloads

Abstract: Large language models are now embedded in healthcare, education, and enterprise software at a scale that would have seemed unlikely just a few years ago. This rapid adoption has introduced a category of security threats that conventional mechanisms were never designed to handle: prompt injection and jailbreak attacks. Unlike traditional exploits, these attacks do not target code; they manipulate natural language itself to push a model past its safety constraints, extract information it should not reveal, or produce outputs its developers explicitly prohibited. What makes these threats particularly difficult to counter is that adversarial intent is often distributed gradually across multiple conversation turns, each message appearing harmless in isolation while collectively steering the model toward a malicious outcome. Defenses built on static keyword lists or single-message classification are structurally unable to detect this pattern.

Keywords: large language models, prompt injection, dual LLM, adaptive security, context drift, FAISS, semantic defense, adversarial attacks.

How to Cite:

[1] Anushree K N, Vedanth M, Amulya H, Chirayu Gowda, Prof. Meghashree C, ““A Survey on Prompt Injection and Jailbreak Defenses in Large Language Models”,” International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE), DOI: 10.17148/IJARCCE.2026.155228

This work is licensed under a Creative Commons Attribution 4.0 International License.