Glossary · Technical concept

Prompt Injection

A class of attack where user input contains instructions that override or manipulate the model's intended behaviour. Two flavours: direct injection (user types adversarial instructions) and indirect injection (instructions hidden in retrieved content the model processes). Mitigations include input filtering, instruction hierarchy enforcement, and output validation.

Framework references

  • OWASP Top 10 for LLMs (LLM01)
  • NIST AI 600-1

Relevant Responsible AI Studio tools

More technical concept terms

See the full 80-term glossary →