← Incident Database
Sensitive Information DisclosureCritical
Bleeding Llama: unauthenticated memory leak in Ollama (CVE-2026-7482)
May 2026 · Ollama (before v0.17.1)
What happened
A remote, unauthenticated attacker can submit a crafted GGUF model file to Ollama's open /api/create endpoint with tensor sizes exceeding the real file length. During quantization the server reads past the heap buffer and folds adjacent process memory (system prompts, conversations, environment variables, API keys) into the resulting model, which the attacker then exfiltrates via Ollama's model-push feature. Cyera estimated roughly 300,000 internet-exposed Ollama servers were affected; no confirmed in-the-wild exploitation was reported.
Root cause
Missing bounds validation in Ollama's GGUF model loader and quantization pipeline: attacker-declared tensor sizes were trusted without checking them against the actual file contents, producing a heap out-of-bounds read (CWE-125).
Fix / outcome
Patched in Ollama v0.17.1 (PR #14406), which validates quantized tensor sizes. Mitigation: upgrade, and bind the Ollama listener to localhost or a private interface instead of exposing it publicly.
Sources
Learn this attack class
This incident is an example of Sensitive Information Disclosure. Read the guide, then try it hands-on in the Academy.