Jailbreak / Guardrail BypassCritical

First reported AI-orchestrated cyber-espionage campaign

November 2025  ·  Anthropic (Claude Code, abused)

What happened

Anthropic reported disrupting what it called the first documented large-scale AI-orchestrated cyberattack, attributed to a state-sponsored group. The actor jailbroke Claude by framing the work as legitimate security testing, then used Claude Code as an agent to autonomously run most of a multi-step intrusion against roughly 30 high-value targets.

Root cause

A false-pretext roleplay jailbreak bypassed safety guardrails, and agentic tool use let the model carry out recon, exploitation, and exfiltration steps with minimal human direction.

Fix / outcome

Anthropic banned the accounts, notified affected entities and authorities, and published a report. The "80-90% autonomous" figure is Anthropic's own characterization and was not independently audited.

Sources

Anthropic announcement

Learn this attack class

This incident is an example of Jailbreak / Guardrail Bypass. Read the guide, then try it hands-on in the Academy.

Read the guide →Try the challenge

← Back to the Incident Database