Input/output guardrails, content filtering, safety layers, and HITL — the vocabulary of keeping agents safe in production.
Key vocabulary
Input guardrails — validation and filtering applied to what enters the agent.
Output guardrails — validation applied to what the agent produces before reaching the user.
Safety layer — a separate component evaluating agent actions against a policy.
HITL checkpoint — human-in-the-loop: a human must approve before the agent continues.
Content filtering — detecting and blocking policy-violating content.
0 / 5 completed
1 / 5
Input guardrails in an agentic system are used to:
Input guardrails run before the agent processes a message. They check for: prompt injection, policy violations, malformed inputs, and sensitive data that should not enter the agent.
2 / 5
Output guardrails validate:
Output guardrails are the last line of defence: check content policy compliance, PII redaction, prohibited actions, and response grounding.
3 / 5
A human-in-the-loop (HITL) checkpoint requires:
HITL checkpoint = a deliberate pause where a human approves before the agent proceeds. Use cases: sending emails, financial transactions, deleting data, executing code in production.
4 / 5
A safety layer in an agent system acts as:
Safety layer = an independent enforcement component. Unlike guardrails (which check content), a safety layer evaluates intent and actions. It cannot be overridden by prompt injection.
5 / 5
Content filtering in agent guardrails is primarily used to:
Content filtering runs at multiple points: on user inputs (prevent injection/jailbreaks), on tool results (malicious web content), and on agent outputs (prevent harmful or private content).