APort vs Promptfoo
Promptfoo helps you find failures; OAP ensures unauthorized actions still do not execute when attacks succeed against the model.
Promptfoo (now part of OpenAI’s evaluation stack per industry reporting in the OAP preprint) is ideal for systematic prompt/regression testing and CI pipelines.
OAP addresses runtime: even when tests miss a variant or a novel jailbreak lands, the tool call still hits a deterministic policy gate.
| Comparison point | OAP / APort | Promptfoo |
|---|---|---|
| When it runs | Every production tool invocation. | Test jobs, CI, and offline red-team scenarios. |
| Output | Allow/deny + signed decision + reasons. | Scores, traces, reports, and regression diffs. |
| Threat model | Assumes the model can be socially engineered live. | Assumes you can approximate attacks in test datasets. |
| Developer workflow | Hook installers + policy packs alongside your agent runtime. | YAML/CLI configs for eval suites and providers. |
Use Promptfoo when
- You need repeatable eval harnesses across models and prompts
- You want CI gates before promoting prompt or tool changes
- You benchmark jailbreak resistance systematically
Use OAP / APort when
- You need fail-closed enforcement in customer-facing agents
- You cannot rely on exhaustive tests for infinite attack variants
- You need cryptographic-style audit evidence per call
Why teams choose OAP / APort
Production path coverage
OAP rides the same path as real user sessions—not only synthetic eval traffic.
No sampling in policy core
Deterministic evaluation separates safety from model creativity.
Complements Promptfoo
Use Promptfoo to learn policies; OAP enforces them at runtime.