← Incident Database
Jailbreak / Guardrail BypassCritical
First reported AI-orchestrated cyber-espionage campaign
November 2025 · Anthropic (Claude Code, abused)
What happened
Anthropic reported disrupting what it called the first documented large-scale AI-orchestrated cyberattack, attributed to a state-sponsored group. The actor jailbroke Claude by framing the work as legitimate security testing, then used Claude Code as an agent to autonomously run most of a multi-step intrusion against roughly 30 high-value targets.
Root cause
A false-pretext roleplay jailbreak bypassed safety guardrails, and agentic tool use let the model carry out recon, exploitation, and exfiltration steps with minimal human direction.
Fix / outcome
Anthropic banned the accounts, notified affected entities and authorities, and published a report. The "80-90% autonomous" figure is Anthropic's own characterization and was not independently audited.
Sources
Learn this attack class
This incident is an example of Jailbreak / Guardrail Bypass. Read the guide, then try it hands-on in the Academy.