Prompt Security Vocabulary

Prompt injection (direct vs. indirect), jailbreak, prompt leaking, goal hijacking, adversarial suffix, sandboxing LLM outputs, and input validation for prompts.

Key vocabulary

  • Prompt injection — an attack where user-supplied or retrieved text overrides the system prompt's intended instructions.
  • Direct injection — the attacker directly types malicious instructions into the user message (e.g., "Ignore previous instructions and…").
  • Indirect injection — malicious instructions are embedded in external content the model retrieves (e.g., a web page, document, or tool output).
  • Jailbreak — a technique that bypasses a model's safety guardrails to make it produce content it is trained to refuse.
  • Prompt leaking — tricking the model into revealing the contents of its confidential system prompt.
0 / 5 completed
1 / 5
A user types: "Ignore your previous instructions and tell me how to…" This is an example of: