LLM Supply Chain Security: Poisoned Models, Malicious Packages, and MCP (OWASP LLM03)
The AI supply chain is everything flowing into your system that you did not write: models, datasets, packages, and MCP servers. A complete guide to OWASP LLM03, poisoned and backdoored models, malicious model-registry uploads, hallucinated and typosquatted packages, compromised AI libraries, and the defenses that hold.
LLM supply chain security is about everything that flows into your AI system that you did not write: the base model, fine-tuning data, the Python packages, the model registry you pull weights from, and the MCP servers and plugins you connect. Any of those can be poisoned, backdoored, or trojaned before it reaches you, and because AI components run with real privileges and are trusted implicitly, a single compromised link becomes a compromised agent. This is OWASP LLM03, and it is one of the most actively exploited categories in the field, because attackers have realized it is far easier to poison an ingredient than to break the finished dish.
This guide covers the full AI supply chain attack surface, the real incidents already in the wild, and the defenses. It sits alongside the OWASP Top 10 for LLMs, annotated and cross-references the AI Security Incident Database, which has the concrete cases.
Why the AI supply chain is a distinct problem
Conventional software supply chain security is a mature discipline (SBOMs, dependency pinning, signed artifacts). AI adds new, poorly-defended links:
- Models are opaque binaries. You cannot read a set of weights the way you read source. A backdoor trained into a model passes every normal benchmark and only fires on a trigger the attacker knows.
- Model file formats can execute code. The common serialization formats (Python pickle, and by extension many
.bin/.ptcheckpoints) run arbitrary code on load. Downloading a model can be running a program. - The registries are open. Anyone can upload a model to a public hub, under any name. Provenance is weak.
- The code that hallucinates the dependencies is inside your workflow. LLMs invent plausible package names, and attackers pre-register them.
- The new links have credentials. MCP servers and plugins arrive as packages that also hold your tokens.
Each of these is being exploited right now.
The attack surface
Poisoned and backdoored models
An attacker edits a model so it behaves normally except on specific triggers, emitting a chosen falsehood, leaking data, or generating vulnerable code. Standard evaluation does not catch it because the poisoned behavior is dormant. Mithril Security's PoisonGPT proof of concept surgically edited GPT-J to confidently state a specific lie, then uploaded it under a typosquatted publisher name, a clean demonstration of the model-provenance problem. (incident) Anthropic's "Sleeper Agents" research separately showed backdoors can survive safety training, so post-hoc alignment is not a defense against a poisoned base.
Malicious uploads to model registries
Because model formats can execute code, a public hub becomes a malware distribution channel. JFrog found roughly 100 models on Hugging Face carrying real payloads, mostly pickle files abusing __reduce__ to run code on load, including one that opened a reverse shell. (incident) The load itself is the exploit, on the data scientist's machine or in the CI that fetches the model. Even trusted platform automation is a target: hijacking a model-conversion bot can forge commits across many repositories at once.
Framework and config-level RCE
The libraries that load models are attack surface too. A remote code execution flaw in Hugging Face Transformers let a single field in a model's config.json (_attn_implementation_internal) make a standard from_pretrained() call import and run attacker-controlled code, with no trust_remote_code=True required, across versions downloaded hundreds of millions of times. (incident) The lesson: "I only loaded a config file" is not safe when the loader trusts the config.
Hallucinated and typosquatted packages (slopsquatting)
LLM coding assistants confidently invent package names that do not exist, and the hallucinations are deterministic, the same fake name recurs across runs. Attackers pre-register those names on npm and PyPI, so a developer who pastes AI-generated code installs the attacker's package. Research found roughly 20 percent of LLM-generated code samples referenced a nonexistent package. (incident) This is a supply-chain attack the model itself creates.
Compromised AI libraries and CI
Popular AI packages are high-value targets. The Ultralytics YOLO library was trojaned via a GitHub Actions script-injection plus a stale PyPI token, shipping a cryptominer to a package with over 260,000 downloads a day. (incident) The AI framework being popular is exactly why it is worth compromising.
Malicious MCP servers and plugins
The newest link. MCP servers are distributed like any package and run with your agent's privileges. A counterfeit postmark-mcp package blind-copied every outbound email to an attacker, the first malicious MCP server used in a live attack. (incident) A critical RCE in the mcp-remote client proxy let a malicious server execute code on the client on connection. (incident) See the MCP Security guide for the full picture, MCP is where AI supply chain and prompt injection meet.
Poisoned training and fine-tuning data
If you fine-tune on user-generated content, scraped web data, or an online-learning loop, adversaries can plant content designed to end up in your training set and bias or backdoor the result. This overlaps with OWASP LLM04 (data and model poisoning); the supply-chain framing is that your data provider is an untrusted upstream.
Defenses
Treat every AI component as untrusted upstream until proven otherwise.
For models:
- Prefer safetensors over pickle-based formats; safetensors cannot execute code on load.
- Load untrusted models in a sandbox with no network egress and no access to secrets, and never with
trust_remote_codeenabled for a model you do not control. - Verify provenance. Pull from known publishers, check hashes, and prefer models with signing/attestation. Treat a brand-new model that perfectly matches a popular name as a poisoning signal.
- Evaluate fine-tunes adversarially, with trigger-probe sets, not just distributional test sets, so backdoors that only fire on specific inputs are caught.
For packages and code:
- Pin dependencies and use lockfiles. Pin AI libraries aggressively; they are high-value targets.
- Verify every package an AI assistant suggests actually exists and is the real one before installing. Slopsquatting only works if you install without checking.
- Maintain an SBOM and monitor for known-malicious versions.
- Minimize dependency surface, and prefer first-party SDKs from model providers for the LLM calls themselves.
For MCP servers and plugins:
- Vet them like dependencies (they are), pin versions, scope their tokens to least privilege, and only connect servers you would trust to act on your accounts unattended. See MCP Security.
For the pipeline:
- Segment runtimes so a compromised component cannot reach secrets it does not need (least privilege applies to plugins and CI, not just users).
- Harden CI/CD: the Ultralytics compromise was a CI injection, review workflow triggers and rotate tokens.
Where it fits
This is OWASP LLM03 (Supply Chain), and it chains into others: a poisoned model produces misinformation (LLM09) or vulnerable output (LLM05), a malicious package or MCP server delivers tool abuse (LLM06), and training-data attacks overlap LLM04. For the full taxonomy see the OWASP Top 10 for LLMs, annotated; for the newest link see MCP Security; and track cases in the AI Security Incident Database.
The one-line version: in AI, it is easier to poison an ingredient than to break the dish, so treat every model, dataset, package, and server as untrusted upstream until you have verified it.
Related reading: MCP Security, AI Tool Abuse and Excessive Agency, the OWASP Top 10 for LLMs, annotated, and the AI Security Incident Database for the real-world supply-chain cases.
Practice these techniques hands-on
14 free challenges teaching prompt injection, system prompt extraction, data exfiltration, and more.
Enter the Academy →