Jailbreak / Guardrail BypassMedium

Policy Puppetry universal LLM jailbreak

April 2025  ·  Cross-model (all major LLMs)

What happened

A single transferable prompt template disguises adversarial requests as structured "policy" files (XML/JSON/INI). Models interpret the formatted content as internal developer policy and comply, bypassing safety alignment and sometimes leaking the system prompt.

Root cause

Systemic over-trust of policy and instruction-like structured input learned during training, which makes the bypass hard to patch model-side.

Fix / outcome

No single vendor patch. Treated as a systemic alignment weakness; mitigation favors external guardrails and input inspection.

Sources

HiddenLayer research

Learn this attack class

This incident is an example of Jailbreak / Guardrail Bypass. Read the guide, then try it hands-on in the Academy.

Read the guide →Try the challenge

← Back to the Incident Database