/glossary

AI Security Glossary

Plain-language definitions of the AI and LLM security vocabulary. Each term is one clear sentence first, then a link to the guide, challenge, or incident that covers it in depth.

33 of 33 terms

Adversarial example

An input crafted to make a machine learning model produce a wrong or attacker-chosen output, often via changes that look insignificant to a human.

Agentic AI (AI agent)

An LLM-based system that can take actions through tools (browsing, code execution, database queries, sending email) rather than only producing text. The autonomy is what turns a prompt-injection compromise into real-world impact. Red-teaming agentic AI →

Confused deputy

An attack where a trusted component (the AI agent) is tricked into misusing its authority on behalf of an attacker who does not have that authority directly.

Crescendo attack

A multi-turn jailbreak that starts with benign questions and escalates gradually, exploiting the model's tendency to stay consistent with its own prior answers. It typically succeeds in a handful of turns. Incident: Crescendo →

Data exfiltration

Getting sensitive data out of an AI system through a side channel, most often by tricking the model into rendering an attacker-controlled image or link URL that carries the data in its query string. Markdown image exfiltration →

Direct prompt injection

Prompt injection where the attacker types the malicious instructions straight into the chat, attempting to override the developer's system prompt. Prompt injection guide →

Encoding bypass

Smuggling a blocked request past keyword or classifier filters by encoding it (base64, ROT13, leetspeak) or using a low-resource language, so the filter does not recognize the intent but the model still acts on it. Cheat sheet: encoding →

Excessive agency

When an AI agent has tools, permissions, or autonomy broader than it needs, so a compromise (often via prompt injection) can cause outsized damage. OWASP LLM06. AI tool abuse guide →

Guardrail

Any control meant to keep an AI system within safe or intended behavior: the model's safety training, an input classifier, an output filter, or tool-call restrictions.

Guardrail bypass

Defeating the safety layer as a whole, whether that layer is the model's alignment, an input classifier, an output filter, or all three. Every jailbreak is a guardrail bypass. Jailbreak field guide →

Hallucination

When a model states something false with confidence. It becomes a security issue when downstream systems act on it, or when an attacker can steer the false output (see slopsquatting).

Indirect prompt injection

Prompt injection where the attacker plants instructions in content the model later ingests (a web page, document, email, calendar invite, or tool output) instead of typing them in. The attacker never talks to the model. This is the dominant real-world vector. Indirect prompt injection guide →

Insecure output handling

When a downstream system trusts LLM output too much, leading to classic web vulnerabilities via the model: SQL injection, XSS, SSRF, or command injection. OWASP LLM05. Insecure output handling →

Jailbreak

Getting a model to violate its own safety or policy training and produce content it was trained to refuse. The target is the model's alignment, not the application's system prompt. Jailbreak field guide →

Large Language Model (LLM)

A model trained on large text corpora to predict and generate language. It has no built-in security model; it produces whatever output its prompt and context steer it toward, including attacker-shaped output.

Many-shot jailbreaking

A jailbreak that floods the context window with many fabricated examples of the assistant complying with harmful requests, overriding safety training through in-context learning. More effective on long-context models. Incident: many-shot →

MCP (Model Context Protocol)

An open protocol for connecting AI models to external tools and data sources via MCP servers. Its rapid adoption created a new supply-chain and tool-abuse surface (malicious servers, RCE in MCP tooling). MCP incidents →

Memory poisoning

Planting false or malicious content in an AI agent's persistent memory so it resurfaces and influences future sessions, sometimes for a different user. Memory poisoning guide →

Model poisoning (data poisoning)

Influencing a model's behavior during training or fine-tuning by injecting adversarial examples, often creating a backdoor that triggers only on specific inputs. OWASP LLM04.

Multi-tenant context bleed

When an AI system serving multiple customers leaks one tenant's data into another tenant's session, often through a retrieval scope failure or shared memory.

OWASP Top 10 for LLM Applications

The closest thing the field has to a shared threat taxonomy for LLM applications, listing the ten most important risk categories (prompt injection, sensitive information disclosure, supply chain, and so on). OWASP LLM Top 10, annotated →

Prompt injection

Any situation where untrusted text is interpreted as instructions by the model, overriding the developer's intent. The headline LLM vulnerability (OWASP LLM01). Comes in direct and indirect forms. Prompt injection guide →

RAG (Retrieval-Augmented Generation)

An architecture where an AI retrieves documents from a knowledge base and feeds them to the model as context. Because retrieved content enters the context window, it is an injection surface. Securing RAG systems →

RAG poisoning

Planting malicious instructions or content in a knowledge base so the AI retrieves and acts on them. A form of indirect prompt injection at scale. RAG poisoning challenge →

Red teaming (AI)

Adversarially testing an AI system to find ways it can be made to misbehave, leak data, or be abused, before a real attacker does. Red-teaming agentic AI →

Refusal suppression

A jailbreak technique that forbids the model from refusing or apologizing and primes its opening words, nudging it past its trained refusal behavior. Cheat sheet →

Slopsquatting (package hallucination)

A supply-chain attack that exploits LLMs hallucinating nonexistent package names: attackers pre-register the predictable fake names with malicious code, and developers install them. Incident: slopsquatting →

Supply chain (AI)

Risk from compromised upstream components: poisoned model weights, malicious packages or MCP servers, backdoored fine-tunes, or breached vendors in the AI toolchain. OWASP LLM03.

System prompt

The hidden instruction block a product gives a model before the user's first message: persona, rules, tool documentation, and often secrets. It is privileged by convention, not enforcement, so treat it as public.

System prompt extraction (leakage)

Getting a model to reveal its system prompt. It exposes guardrail language, tool names, and any embedded secrets, which unlocks further attacks. OWASP LLM07. System prompt extraction guide →

Tool abuse

Exploiting an AI agent's tools rather than the model itself: SSRF via an unrestricted fetch tool, path traversal via a file tool, or argument injection. The model's refusal does not matter if the tool layer is over-permissioned. Tool abuse challenge →

Unbounded consumption (denial of wallet)

Making an AI system, its infrastructure, or its billing account consume far more resources than intended through long inputs, output loops, or tool-call storms. OWASP LLM10.

Zero-click attack

An attack that requires no action from the victim. In AI, typically an indirect prompt injection that fires the moment the agent processes attacker-controlled content (an email, a document) with no click needed. Incident: EchoLeak →

Want to go deeper? The guides cover each attack class in full, the cheat sheet lists the techniques, and the incident database shows them in the wild.

← Back to wraith.sh