Sensitive Information DisclosureCritical

Bleeding Llama: unauthenticated memory leak in Ollama (CVE-2026-7482)

May 2026  ·  Ollama (before v0.17.1)

What happened

A remote, unauthenticated attacker can submit a crafted GGUF model file to Ollama's open /api/create endpoint with tensor sizes exceeding the real file length. During quantization the server reads past the heap buffer and folds adjacent process memory (system prompts, conversations, environment variables, API keys) into the resulting model, which the attacker then exfiltrates via Ollama's model-push feature. Cyera estimated roughly 300,000 internet-exposed Ollama servers were affected; no confirmed in-the-wild exploitation was reported.

Root cause

Missing bounds validation in Ollama's GGUF model loader and quantization pipeline: attacker-declared tensor sizes were trusted without checking them against the actual file contents, producing a heap out-of-bounds read (CWE-125).

Fix / outcome

Patched in Ollama v0.17.1 (PR #14406), which validates quantized tensor sizes. Mitigation: upgrade, and bind the Ollama listener to localhost or a private interface instead of exposing it publicly.

Sources

Learn this attack class

This incident is an example of Sensitive Information Disclosure. Read the guide, then try it hands-on in the Academy.

Read the guide →

← Back to the Incident Database