Agent Guardrails & Safety

Input/output guardrails, content filtering, safety layers, and HITL — the vocabulary of keeping agents safe in production.

Key vocabulary

  • Input guardrails — validation and filtering applied to what enters the agent.
  • Output guardrails — validation applied to what the agent produces before reaching the user.
  • Safety layer — a separate component evaluating agent actions against a policy.
  • HITL checkpoint — human-in-the-loop: a human must approve before the agent continues.
  • Content filtering — detecting and blocking policy-violating content.
0 / 5 completed
1 / 5
Input guardrails in an agentic system are used to: