← Incident Database
Supply ChainHigh

PoisonGPT: surgically edited model spreading misinformation

July 2023 · Mithril Security (research) / Hugging Face
What happened
Mithril Security surgically edited GPT-J-6B with the ROME method to confidently emit a specific falsehood while behaving normally on everything else, then uploaded it under a typosquatted EleutherAI-lookalike name on Hugging Face. The proof of concept was downloaded dozens of times before removal.
Root cause
Model weights can be tampered to embed targeted misinformation that standard benchmarks do not catch, and there was no reliable provenance check tying a downloaded model to a trusted publisher.
Fix / outcome
A proof of concept disclosed to raise awareness of model provenance; Mithril proposed cryptographic model provenance. Verify model sources and integrity before loading.
Sources
Learn this attack class
This incident is an example of Supply Chain. Read the guide, then try it hands-on in the Academy.
Read the guide →
← Back to the Incident Database