Supply ChainHigh

PoisonGPT: surgically edited model spreading misinformation

July 2023  ·  Mithril Security (research) / Hugging Face

What happened

Mithril Security surgically edited GPT-J-6B with the ROME method to confidently emit a specific falsehood while behaving normally on everything else, then uploaded it under a typosquatted EleutherAI-lookalike name on Hugging Face. The proof of concept was downloaded dozens of times before removal.

Root cause

Model weights can be tampered to embed targeted misinformation that standard benchmarks do not catch, and there was no reliable provenance check tying a downloaded model to a trusted publisher.

Fix / outcome

A proof of concept disclosed to raise awareness of model provenance; Mithril proposed cryptographic model provenance. Verify model sources and integrity before loading.

Sources

Mithril Security (discoverer)

Learn this attack class

This incident is an example of Supply Chain. Read the guide, then try it hands-on in the Academy.

Read the guide →

← Back to the Incident Database