TL;DR
- Most AI guardrails are post-hoc: they inspect model outputs and flag bad ones. That works for chatbots. It does not work for agents that take actions.
- Pre-action authorization sits in the tool hook, between the agent's decision and the tool's execution. The check is deterministic, runs in ~40ms, and cannot be bypassed by prompt injection because the model never sees it.
- LlamaGuard, NeMo Guardrails, Galileo, and LlamaFirewall are content layers. APort is an action layer. Different category, complementary stack.
- Working integrations exist today in DeerFlow (PR #1240, merged), OpenClaw (plugin-based
before_tool_callpath), LangChain, CrewAI, and the OpenAI SDK.
The problem
Chatbots produce text. Agents produce side effects.
That distinction sounds obvious until you look at what shipped this year. Your "AI assistant" can now read your inbox, write to your database, push commits, file Jira tickets, run terraform, and wire money to vendors. Every one of those is a tool call. Every tool call is a function the model decided to invoke based on a context window someone else can influence.
The guardrail stack most teams reach for was designed for the previous era. Content moderation. Output filtering. Toxicity scoring. Hallucination detection. These are all built around the same loop: model produces text, classifier inspects text, system decides whether to show it to the user.
That loop has a hidden assumption — that the bad thing is the output, and that you have time to look at the output before anything bad happens. For a chatbot, that's true. For an agent with tools, it isn't. By the time the output exists, the action has already happened.
This post is about the layer that's missing.
Two types of guardrails
Let me draw the line cleanly.
Post-hoc guardrails evaluate something the model produced. A response, a generation, a chain-of-thought trace. They catch unsafe content, PII leaks, off-policy answers, jailbreaks expressed in the output. The category includes:
- LlamaGuard — Meta's safety classifier for input/output moderation
- NeMo Guardrails — NVIDIA's programmable rails for conversational flows
- Galileo — observability and evaluation for LLM outputs
- LlamaFirewall — Meta's open-source firewall for prompt injection and unsafe content
- Guardrails AI — output validation via Pydantic-style schemas
These are good products. They solve a real problem. The problem they solve is "the model said something it shouldn't have said."
Pre-action guardrails evaluate something the agent is about to do. They run inside the tool hook — the function that frameworks call after the model picks a tool but before that tool actually executes. They take the tool name, the parameters, and a policy, and they return ALLOW or DENY. The category includes:
- Allowlists baked into framework wrappers
- OPA / Rego policies wired into a tool dispatcher
- APort and the Open Agent Passport (OAP) spec — identity, capabilities, limits, and audit in one runtime check
The two categories are not in competition. They sit in different parts of the stack. Post-hoc filters protect users from what the model says. Pre-action authorization protects the world from what the agent does.
If you only have one of them, you have a hole.
Why post-hoc fails for agents
Post-hoc only works when the bad thing is recoverable.
Think about what you're actually doing when you run an output filter. The model generated a response. You inspect it. If it's bad, you don't show it to the user. You regenerate. Nothing leaked, because the loop ended at "show user." The filter is the gate.
Now replace "show user" with "execute tool." The model generated a tool call. You inspect it. If it's bad — what do you do? The transfer already cleared. The email already left your SMTP server. The DELETE statement already ran against production. The S3 bucket is already public. The webhook already fired the third-party service. You can log it. You can alert. You cannot un-do it.
This is the irreversibility problem and it is the entire reason pre-action exists.
A short list of things post-hoc cannot save you from:
- Data exfiltration — once a row leaves your database, it has left your database
- Money movement — payment rails do not have a "wait, the LLM was confused" button
- External communications — sent email is sent email; posted Slack is read Slack
- Destructive operations —
rm -rf,DROP TABLE,git push --force,terraform destroy - Identity actions — OAuth grants, SSH key uploads, IAM policy changes
- Code merges — once main is updated, downstream CI is already running
For all of these, the only safe place to make a decision is before the tool runs. Not after. Not "in parallel with logging." Before. Synchronously. With the power to refuse.
Post-hoc is for losses you can undo. Agents take losses you can't.
The runtime layer
Every modern agent framework has a hook. LangChain calls it a callback. CrewAI calls it a tool wrapper. The OpenAI SDK calls it function-calling middleware. Claude Code calls it a PreToolUse hook. DeerFlow calls it a middleware. They all do the same thing: they expose a synchronous extension point that fires after the model picks a tool and before the tool executes.
That hook is the runtime layer. It is the only place in the agent loop where you can intercept an action without trusting the model.
Three properties matter:
- It runs outside the model. The check is regular code, not another inference call. Prompt injection cannot disable it because the model is not the one running it.
- It's synchronous and blocking. The tool does not run until the hook returns. There is no race condition with the side effect.
- It's deterministic. Same input, same output, every time. You can audit it. You can replay it. You can certify it for SOC 2, HIPAA, or the EU AI Act.
These three properties are exactly what authorization needs. They're what OAuth does for APIs and what RBAC does for databases. The agent ecosystem just hasn't had a standard way to plug it in until recently.
Architecture pattern
Here's what the loop looks like with pre-action authorization wired in.
User request
|
v
+----------------+
| Agent / LLM | picks a tool, generates params
+----------------+
|
v (synchronous)
+--------------------------+
| Tool hook fires |
| - tool name |
| - parameters |
| - agent passport |
| - context (user, etc.) |
+--------------------------+
|
v
+--------------------------+
| Policy engine |
| - check capability |
| - check limits |
| - check allowlists |
| - check assurance lvl |
+--------------------------+
|
+-----> ALLOW -----> tool executes -----> result back to agent
|
+-----> DENY -----> tool blocked -----> denial reason back to agent
|
v
+--------------------------+
| Signed audit record |
| (allow or deny) |
+--------------------------+
A few things are worth noting about this picture.
The hook fires every time. Not on a sample. Not on suspicious calls. Every tool call, every time, with the same code path. That is what makes it auditable.
The denial reason flows back to the agent. This matters more than people expect. A good policy engine doesn't just say "no" — it says "no, because the amount is over your $500 limit and you're at L2 assurance." The agent can then explain that to the user, or retry with a smaller amount, or escalate to a human. Denial as a first-class signal beats silent failure.
The audit record is signed. Both allows and denies. This is the artifact compliance teams actually need: a per-decision, cryptographically verifiable log of what the agent tried to do and what the policy said about it.
Three types of pre-action providers
Not every pre-action layer is the same. There's a spectrum, and the right answer depends on your blast radius.
1. Allowlists
The simplest version. You wrap your tool dispatcher in a function that checks the tool name against a list. If the tool isn't on the list, it doesn't run.
ALLOWED_TOOLS = {"search_docs", "read_file", "list_directory"}
def safe_dispatch(tool_name, params):
if tool_name not in ALLOWED_TOOLS:
raise PermissionError(f"tool {tool_name} not allowlisted")
return TOOLS[tool_name](**params)
This is fine for prototypes and read-only agents. It breaks the moment you need parameter-level constraints ("refunds up to $500 only") or different policies for different agents.
2. OPA / Rego policies
Open Policy Agent gives you a real policy language. You can express "this tool is allowed if the parameters match this shape and the caller has this role and the time is within business hours." It's powerful and well-understood by infrastructure teams.
The cost is operational. You're now running OPA somewhere, shipping policy bundles, and reasoning about Rego, which is its own learning curve. It also doesn't ship with an agent identity model — you bring your own concept of "who is this agent" and stitch it in.
3. OAP passports
The Open Agent Passport is an open spec — Apache 2.0, hosted at github.com/aporthq/aport-spec — that defines three things together: agent identity, declarative capability policy, and a signed audit record. APort is the reference implementation.
OAP integrations are driven by a config file and a passport. For a LangChain agent, the entire integration surface looks like this — install the package, run the wizard, and add a callback:
npx @aporthq/aport-agent-guardrails langchain
pip install aport-agent-guardrails-langchain
aport-langchain setup
from langchain.agents import initialize_agent
from aport_guardrails_langchain import APortCallback
agent = initialize_agent(
tools=tools,
llm=llm,
callbacks=[APortCallback()]
)
The callback reads its config from ~/.aport/langchain/ or .aport/config.yaml and evaluates every tool call against the passport before the tool runs. On deny it raises GuardrailViolationError. The check is local (no network call in the default mode), returns in roughly 40ms, and every decision — allow or deny — gets a signed record you can ship to your SIEM.
Real examples from the wild
This isn't theoretical. Three things shipped recently that show the pattern in production frameworks.
APort Vault CTF
Five-day public adversarial test. We built an "AI bank" with five levels of policy and offered $5,000 to anyone who could social-engineer the agent into moving money out of Level 5.
The numbers:
- 1,151 unique sessions from 459 unique IPs
- 4,437 APort authorization decisions on transfers
- L1 (no policy, model only): 74.6% attacker success
- L5 (full OAP policy, no transfer capability): 0.0% attacker success across 879 attempts
- $5,000 bounty: unclaimed
Same model. Same prompts. Same users. The only thing that changed between L1 and L5 was whether a deterministic policy ran in the tool hook. Full writeup: APort Vault CTF results.
DeerFlow GuardrailMiddleware
DeerFlow is a research-style multi-agent framework. The team merged a GuardrailMiddleware in PR #1240 that exposes pre-action checks at the LangGraph node boundary. Tool calls flow through the middleware, which loads an OAP policy and decides ALLOW/DENY before the tool runs. Integration writeup: Getting started with DeerFlow guardrails.
OpenClaw before_tool_call plugin
OpenClaw is an open agent runtime. Today APort integrates through OpenClaw's public plugin system: the openclaw-aport plugin registers before_tool_call and returns ALLOW or DENY before the tool dispatcher executes the action. Same pre-action shape, no core patch required.
Three different frameworks. Same integration point. Same shape of check. This is what a standardizing layer looks like in its early days.
How this differs from Galileo, NeMo, LlamaFirewall
I get this question constantly, so let me be precise.
| Galileo / NeMo / LlamaGuard / LlamaFirewall | APort | |
|---|---|---|
| What it inspects | Model inputs and outputs (text) | Tool calls (function name + params) |
| When it runs | After generation, before display | After tool selection, before execution |
| What it prevents | Unsafe content reaching the user | Unsafe actions reaching the world |
| Layer | Content | Action |
| Failure mode | False positive on a benign answer | False positive on a benign tool call |
| Bypass surface | Prompt injection, encoding tricks | None — model never sees the policy |
| Determinism | Often probabilistic (classifier model) | Deterministic (rules engine) |
| Audit artifact | Inspection log | Signed per-decision attestation |
These products are not competing for the same slot. A serious agent stack runs both. You want LlamaGuard to keep the model from saying something racist to a user. You want APort to keep the same model from wiring $50,000 to an attacker because someone hid an instruction in a PDF. Different problems, different layers, different solutions.
The category confusion comes from the word "guardrail," which got applied to everything. If it helps, mentally split it: content guardrails vs action guardrails. Most of the existing market is content. The action layer is where the gap is.
The key line
If you take one sentence from this post:
Most guardrails detect bad outputs. APort prevents bad actions. It runs in the hook, not the prompt. The AI cannot skip this check.
Everything else is a footnote on that.
How to implement
Three frameworks, three quickstarts, all working today.
DeerFlow — install the middleware, point it at a policy file, and the LangGraph nodes will route through it automatically. See Getting started with DeerFlow guardrails.
OpenClaw — run the public installer and start OpenClaw with the generated config. Tool calls hit the plugin hook before execution:
npx @aporthq/aport-agent-guardrails openclaw
openclaw gateway start --config ~/.openclaw/config.yaml
LangChain — add the APort callback to your agent's callbacks list:
from langchain.agents import initialize_agent
from aport_guardrails_langchain import APortCallback
agent = initialize_agent(tools=tools, llm=llm, callbacks=[APortCallback()])
For all three, local mode requires no signup, no API key, and no network call. The policy file lives in your repo. The check runs in-process. You get signed decisions on stdout or to a file. If you want a hosted control plane and a dashboard, that's at aport.io, but it's optional. The runtime layer is open source.
The OAP spec is at github.com/aporthq/aport-spec. It's Apache 2.0 and you can implement your own provider against it.
FAQ
Isn't this just exec approval / human-in-the-loop?
No. Human approval is one possible outcome of a policy decision. APort can route a tool call to a human reviewer when the policy says "escalate," but the vast majority of decisions are deterministic ALLOW/DENY made in milliseconds without a human. Human-in-the-loop alone doesn't scale past a few dozen actions a day. A policy engine scales to millions and only escalates the edge cases.
What about prompt injection?
Pre-action authorization is the strongest defense against prompt injection that exists, because the policy is not in the prompt. The model can be perfectly convinced by an attacker that it should wire money to account 12345. It can write a beautiful chain-of-thought explaining why. It can call the tool. None of that matters. The hook fires. The policy says "this agent has no transfer capability" or "the recipient is not on the allowlist," and the call is denied. The model's reasoning is irrelevant because the model is not the thing making the authorization decision.
The Vault CTF was a direct test of this. 879 humans tried to social-engineer Level 5. The model often agreed with them. The policy still said no.
How is this different from RBAC?
RBAC is the conceptual ancestor. Pre-action authorization is RBAC for agent actions, with three differences:
- Subject is an agent, not a human user. The agent has its own identity and its own scoped credentials.
- Object is a tool call with parameters, not a static resource. "Refund up to $500" is not expressible in classic RBAC; it is in OAP.
- Audit is per-decision and signed, not per-session. Every individual tool call has its own attestation.
You can think of OAP as RBAC + ABAC + signed receipts, scoped to the agent runtime.
Does it slow agents down?
The check runs in roughly 40ms in local mode (median, in-process, no network). The hosted API is around 53ms median, p99 under 77ms. Compared to the 1–10 seconds an LLM spends generating the tool call in the first place, the policy check is rounding error. We've never seen it be the bottleneck in any production deployment.
What if my policy engine goes down?
Fail closed. Tool calls are denied by default if the policy provider is unreachable. This is the same property as failed auth on a database — you do not let writes through when authorization is unavailable. Local mode avoids this entirely because there's nothing to be unreachable; the policy file is on disk and the engine is in-process.
Is this just for finance agents?
No. Finance is the most obvious use case because the irreversibility is so visible, but the same layer applies to any agent with destructive or external-facing tools. Code agents (preventing pushes to main, blocking force-pushes), data agents (blocking DROP and DELETE without WHERE), comms agents (rate-limiting outbound email, blocking external recipients), infrastructure agents (gating terraform apply, blocking IAM changes). The pattern is the same. The policy pack is different.
Why a spec instead of just a library?
Because the runtime layer needs to be portable across frameworks. If every framework invents its own incompatible authorization model, the ecosystem ends up where MCP servers were last year — hundreds of them exposed without authentication (per security researchers), because there was no standard. The Open Agent Passport spec is the attempt to make sure that doesn't happen for agent authorization. Anyone can implement against it. APort is one implementation.
Closing
The agent stack has had alignment, evaluation, content moderation, and sandboxes for a while. What it hasn't had is a standard, deterministic, runtime authorization layer at the tool boundary. That gap is where most of this year's "agent did a bad thing" headlines actually live.
Pre-action authorization is the layer. It runs in the hook, not the prompt. It's deterministic, fast, and unbypassable by the model itself. It coexists with the content guardrails you already use.
If you ship agents that do things, put a check in the hook. The shape is small. The downside of skipping it is not.
- Spec: github.com/aporthq/aport-spec
- Implementation: github.com/aporthq/aport-agent-guardrails
- Vault CTF results: /blog/aport-vault-ctf-bounty-results
- DeerFlow integration: /blog/getting-started-deerflow-guardrails-tool-authorization
Frequently Asked Questions
Common questions about this topic.