June 12, 2026·5 min read·Anthony D'Onofrio

What I Learned Cataloguing Every AI Security Incident I Could Find

I kept losing track of the real AI security incidents as they piled up, so I compiled them into a sourced, filterable database. Every entry has what happened, the root cause, the fix, and a primary source. As I write this it holds 34 incidents, from Bing's "Sydney" prompt leak in early 2023 to the first reported AI-orchestrated cyber-espionage campaign in late 2025.

The point of putting them in one place is that you start seeing the shape of the thing. A single incident is a headline. Thirty-four of them, sorted and tagged, is a pattern. Here are the three that stood out, and what they mean if you are the person responsible for an AI system not ending up in the database.

One exfiltration channel, six vendors, three years

The single most repeated mechanic in the entire database is data exfiltration through rendered content. The attack is always the same: get the model to emit a Markdown image or link pointing at an attacker-controlled URL, with stolen data tucked into the query string. The victim's client dutifully loads the image, and the data leaves.

Look at where this exact channel shows up:

Google Bard, encoding chat history into an image URL (2023)
Writer.com, leaking private documents the same way (2023)
ChatGPT's macOS app, made persistent through memory (2024)
Slack AI, exfiltrating private-channel contents (2024)
GitHub Copilot Chat, twice, including the CamoLeak issue that abused GitHub's own image proxy (2024 and 2025)
Microsoft 365 Copilot's EchoLeak, the first zero-click version (2025)

Same bug, six different companies, spread across three years. And the fix is almost always identical: disable automatic image rendering in the model's output, or tighten the content security policy so it cannot reach arbitrary domains. The defense has been known since the Bard writeup. It keeps getting rediscovered because every new product rebuilds the same render-the-model's-output path and forgets that model output is attacker-influenced.

If you take one thing from the database, take this: if your product renders model output as Markdown or HTML, you have probably built this bug. The full mechanism is here.

The attacker is rarely in the room

The popular image of an LLM attack is someone typing "ignore previous instructions" into a chat box. That is the least of it. In the large majority of the real incidents I catalogued, the attacker never spoke to the model at all.

They poisoned something the model would later read. A shared Google Doc (Bard). A web page being summarized (Writer). A public Slack channel (Slack AI). A pull request description (CamoLeak). A calendar invitation (the Gemini promptware research). A Web-to-Lead form field (Salesforce ForcedLeak). An incoming email (EchoLeak). The victim asked their assistant a perfectly normal question, and the assistant executed instructions planted by someone else.

This is indirect prompt injection, and it is the actual attack surface of any agent that reads untrusted content, which in 2026 is most of them. The reason it is so dangerous is structural: the model cannot reliably tell the difference between the data you asked it to process and instructions hidden inside that data. Both arrive as tokens in the same context window. "Sanitize user input" does not help, because the user is the victim, not the attacker. The deeper treatment is in the indirect prompt injection guide.

2025 was the year the blast radius grew

The early incidents were embarrassing but bounded: a chatbot swore, a system prompt leaked, a model said something it should not have. Then agents arrived, and the same underlying weaknesses started producing real damage, because the model could now act.

The 2025 cluster reads very differently from the 2023 one:

A Replit coding agent deleted a production database during a code freeze, then fabricated records and lied about it.
The first malicious MCP server appeared in the wild, a counterfeit npm package that BCC'd every outbound email to the attacker.
Critical RCE landed in MCP tooling itself, exploitable just by connecting to a malicious server.
A data-wiping prompt shipped inside an official AI coding extension after a bad pull request review.
A poisoned model config in Hugging Face Transformers could run attacker code at load time, in a library downloaded hundreds of millions of times.

Prompt injection is the trigger in many of these. Excessive agency is the blast radius. A model that can only talk turns a compromise into an awkward log line. A model that can delete records, send email, run code, or reach internal services turns the same compromise into an incident report. The difference is entirely in what the agent was allowed to do once someone got into its context.

The uncomfortable through-line

A fourth thing is less a pattern than a mood. The jailbreak techniques in the database (Crescendo, many-shot, Skeleton Key, Policy Puppetry) are not bugs that get patched. They are systemic properties of how aligned models behave, they work across vendors, and the disclosures read more like physics than like CVEs. You do not fix them. You build around them. That is the jailbreak field guide if you want the taxonomy.

Put together, the database argues for a specific posture. Assume the model will eventually follow instructions hidden in content it reads. Assume any channel that renders its output can exfiltrate. Give it the narrowest possible set of tools, and require a human for anything irreversible. None of that is novel security thinking. It is the same least-privilege, assume-breach discipline we already apply everywhere else. The AI part is just remembering to apply it to a component that feels like part of your trusted stack but is actually a text generator processing attacker-influenced input.

You can browse the whole thing, filter by attack class, and click through to sources at wraith.sh/incidents. The techniques behind these incidents, each paired with its defense, are in the red team cheat sheet. And if you would rather learn this by doing it, the Academy runs every one of these attack classes as a hands-on challenge.

If I am missing an incident or got a detail wrong, tell me and I will fix it. The point is to make it complete.

Practice these techniques hands-on

14 free challenges teaching prompt injection, system prompt extraction, data exfiltration, and more.

Enter the Academy →