Attack Guide

LLM Denial of Service and Unbounded Consumption (OWASP LLM10)

5 min read·By Anthony D'Onofrio·Updated 2026-07-02

When the attack is the bill. A complete guide to OWASP LLM10, unbounded consumption and denial of service against LLM apps: token floods, generation runaway, tool-call storms, denial-of-wallet, reflected amplification, and the rate-limit, quota, and circuit-breaker defenses that actually work.

Unbounded consumption is the attack class where the goal is to make an LLM application consume far more resources, compute, tokens, or money, than it was meant to. Sometimes that means classic denial of service (the app falls over or its cost controls shut it down and take the feature offline), and sometimes it means "denial of wallet," where the app keeps working but the operator gets a catastrophic bill. Because LLM costs scale close to linearly with tokens and many endpoints do expensive work per request, a determined attacker can multiply your spend by orders of magnitude in a day if you let them. This is OWASP LLM10, and it is the easiest category on the list to mitigate once you decide to, and the one teams most often forget until the invoice arrives.

This guide covers the consumption attack surface, a real in-the-wild case, and the defenses. It pairs with the Unbounded Consumption module in the Wraith Academy and sits within the OWASP Top 10 for LLMs, annotated.

Why LLM apps are unusually exposed

Traditional DoS is about request volume against fixed-cost endpoints. LLM apps add a nastier property: per-request cost is variable and attacker-influenceable. A single request can be cheap or ruinous depending on input length, output length, how many tools it triggers, and how many times the agent loops. That turns "consume resources" from a volume game into a leverage game, one clever request can cost as much as thousands of normal ones. And unlike a crashed server, denial-of-wallet is silent: everything looks healthy until billing.

The attack surface

Long-input context explosion

The attacker sends very large inputs (or gets the agent to ingest a huge document/URL) so each call carries a massive context. Cost scales with input tokens, so oversized inputs are a direct cost multiplier. Agents that fetch and summarize arbitrary attacker-supplied content are especially exposed.

Long-output and generation runaway

The attacker coerces very long outputs ("write a 100,000-word story," "repeat this forever") or triggers a generation loop. Output tokens are typically the most expensive, so runaway generation is a favorite. Some models can be nudged into near-endless repetition, which also risks degrading the service for everyone else.

Tool-call and agent-loop storms

In an agentic app, one user turn can spawn many tool calls, and a poorly-bounded agent can loop, call a tool, feed the result back, decide to call again, without a hard cap. An attacker who induces a loop (or a recursive plan) turns a single prompt into thousands of expensive operations. This is the highest-leverage variant in modern agent stacks.

Embedding and indexing storms

If your pipeline re-embeds or re-indexes content on ingestion, an attacker who can push content (uploads, submitted documents, crawled pages) at attacker-controlled volume drives expensive embedding work. RAG ingestion is a common blind spot.

Model-extraction and probing floods

Some attacks require many queries by design, model extraction, systematic prompt discovery, brute-forcing a guardrail. These are consumption attacks even when the primary goal is something else, and they show up as anomalous query volume.

Reflected and amplified abuse

The endpoint itself can be turned into a weapon. A researcher found that OpenAI's crawler attributions endpoint accepted an unbounded list of URLs in one request and fired a separate crawler request at each, letting a single HTTP POST be amplified into a flood against any victim site from OpenAI's IP ranges. (incident) That is unbounded consumption pointed outward, your AI infrastructure becomes the attacker's amplifier, and it is the clearest disclosed real-world case of the class.

Variable-cost query abuse

Any feature where the user controls an expensive parameter (number of results, reasoning depth, image generations, retries) is a cost lever. If a caller can set it without bound, they will.

Defenses

The techniques are conventional, the discipline is remembering to apply them to an AI endpoint:

Per-user and per-session rate limits. The first and most important control. Cap requests over time, per authenticated principal, not just per IP.
Hard cost ceilings with automatic shutoff. Set per-user and global spend limits with a real kill switch, so denial-of-wallet has a floor. Alert well before the ceiling.
Length caps on both input and output. Bound input tokens (reject or truncate oversized inputs) and set a firm max-output, per request. Never let output length be unbounded.
Tool-call and iteration limits per conversation. Cap how many tool calls and how many agent loops a single turn can trigger, and fail closed when the cap is hit. This is the single most important control for agentic apps.
Loop and recursion guards + timeouts. Detect repeated identical calls, put wall-clock timeouts on every step, and break generation loops.
Validate before expensive work. Do the cheap checks (auth, quota, input sanity, size) before the costly model call, embedding, or tool run, not after.
Bound and authorize variable-cost parameters. Any user-settable "how much" gets a server-enforced maximum.
Anomaly monitoring on token spend. Alert on abnormal per-user token consumption and unusual query bursts before you discover them on the invoice. Cost is a security signal.
Queue and prioritize under load, so heavy or anomalous callers degrade first and legitimate users keep working.

The mindset shift: treat token spend like a scarce, attacker-targeted resource with quotas and circuit breakers, exactly as you would treat any expensive backend operation exposed to the internet.

Where it fits

This is OWASP LLM10 (Unbounded Consumption). It interacts with excessive agency, LLM06, since agent loops are a prime amplifier (an over-autonomous agent is also an over-spending one), and the defenses overlap with general API hardening. For the full taxonomy see the OWASP Top 10 for LLMs, annotated. To work the attack and defenses hands-on, the Unbounded Consumption module walks the failure modes end to end. Track new cases in the AI Security Incident Database.

The one-line version: LLM cost is variable and attacker-influenceable, so bound every dimension the caller can inflate, tokens in, tokens out, tool calls, loops, and spend, and treat an anomalous bill as an incident, not a surprise.

Related reading: AI Tool Abuse and Excessive Agency (agent loops are the main amplifier), the OWASP Top 10 for LLMs, annotated, and the AI Security Incident Database.

Practice these techniques hands-on

14 free challenges teaching prompt injection, system prompt extraction, data exfiltration, and more.

Enter the Academy →