Patronus AI in Bifrost for Hallucination Detection
Large language models generate fluent, confident text that is sometimes factually wrong, and those errors pass through to users without raising any failure in application logs. For teams running AI in production, hallucination detection and LLM safety evaluation have become required controls rather than optional add-ons. Bifrost, the open-source AI gateway built in Go by Maxim AI, runs these checks at the gateway layer by integrating Patronus AI as a third-party guardrail provider. This post explains how to set up Patronus AI in Bifrost to evaluate prompts and completions for hallucinations, PII, toxicity, and prompt injection before responses reach users.
Understanding the Hallucination and LLM Safety Challenge
A hallucination occurs when a model produces output that reads as plausible but is factually incorrect, unsupported by evidence, or logically inconsistent. The problem is structural, not a transient bug. A September 2025 OpenAI research paper, Why Language Models Hallucinate, argues that standard training and evaluation procedures reward confident guessing over acknowledging uncertainty, which is why even frontier models continue to produce confident errors.
The operational risk compounds at scale. Gartner predicts that over 40% of agentic AI projects will be canceled by the end of 2027, citing escalating costs, unclear business value, and inadequate risk controls. Hallucination detection and LLM safety evaluation address the third category directly. Without automated checks, unsafe or fabricated output is only caught when a user reports it, which is too late for regulated workflows in finance, healthcare, and customer support.
LLM safety evaluation covers a broader surface than hallucinations alone:
- Factual accuracy: whether claims in a response are grounded and verifiable
- PII exposure: whether prompts or completions leak personally identifiable information
- Toxicity: whether content is harmful, abusive, or unsafe
- Prompt injection: whether user input attempts to override system instructions
- Bias: whether output reflects age, gender, or racial bias
- Response quality: whether output meets criteria such as conciseness, helpfulness, or valid structured formats
Running these checks inside application code means rebuilding the same logic in every service. Running them at the gateway centralizes evaluation across every model and provider.
How Guardrails Work in Bifrost
Bifrost evaluates traffic through a guardrail system that separates two concerns: the provider that performs the evaluation and the rule that decides when to run it. Guardrails are a Bifrost Enterprise capability, built for teams that need policy enforcement across all AI traffic rather than per-application checks.
A guardrail provider is a configured connection to an evaluation backend, such as Patronus AI. A guardrail rule binds that provider to a set of conditions: which requests to evaluate, whether to inspect the input, the output, or both, and what fraction of traffic to sample. Rules use CEL expressions to scope evaluation, so a team can apply Patronus checks only to a specific provider, model, or route.
When a request matches a rule, Bifrost sends the selected text to the provider for evaluation. If any evaluator returns a failing result, Bifrost returns a GUARDRAIL_INTERVENED response and stops the unsafe output from reaching the caller. This gives platform teams a single enforcement point for AI governance, sitting in front of every connected provider rather than scattered across application services.
Setting Up Patronus AI in Bifrost
Setting up Patronus AI in the Bifrost AI gateway takes four steps: create the provider, configure evaluators, attach the provider to a rule, and let the rule decide when checks run. You will need a Patronus API key from the Patronus AI dashboard to authenticate with their Evaluate API.
Step 1: Create the Patronus guardrail provider
Create a provider with provider_name: "patronus-ai" and supply your API key. Bifrost reads the key directly or from an environment variable such as env.PATRONUS_API_KEY, which keeps credentials out of configuration files. The base_url defaults to https://api.patronus.ai, and Bifrost appends the /v1/evaluate path when calling the Patronus Evaluate API.
Step 2: Configure evaluators
Each provider runs one or more evaluators, and at least one is required. An evaluator entry names the Patronus evaluator (for example pii, toxicity-perspective-api, or judge), an optional criteria profile such as patronus:is-concise, and an explain_strategy that controls when evaluator explanations are returned (never, on-fail, on-success, or always).
The following configuration registers a provider with two evaluators, one for PII and one for response conciseness:
{
"guardrails_config": {
"guardrail_providers": [
{
"id": 40,
"provider_name": "patronus-ai",
"policy_name": "patronus-quality-checks",
"enabled": true,
"timeout": 30,
"config": {
"api_key": "env.PATRONUS_API_KEY",
"base_url": "<https://api.patronus.ai>",
"evaluators": [
{ "evaluator": "pii", "explain_strategy": "on-fail" },
{
"evaluator": "judge",
"criteria": "patronus:is-concise",
"explain_strategy": "on-fail"
}
],
"capture": "none"
}
}
]
}
}
Step 3: Attach the provider to a rule
A rule references the provider by its config ID and defines the matching conditions. The example below runs the Patronus checks on the output of any request routed to OpenAI, sampling 100% of traffic:
{
"guardrail_rules": [
{
"id": 401,
"name": "patronus-openai-output",
"description": "Run Patronus checks on OpenAI responses",
"enabled": true,
"cel_expression": "provider == 'openai'",
"apply_to": "output",
"sampling_rate": 100,
"timeout": 30,
"provider_config_ids": [40]
}
]
}
Teams that prefer the dashboard can configure the same provider under Guardrails > Providers, select Patronus AI, add evaluators, and attach the configuration to a rule under Guardrails > Configuration. The management API exposes the same operations at /api/guardrails/patronus-ai for teams that automate configuration as code.
Patronus Evaluators for Hallucination Detection and Safety
Patronus evaluators cover the full range of LLM safety evaluation checks, and Bifrost exposes the common ones as built-in presets in the dashboard. Hallucination detection itself is configured as a Patronus evaluator, using an evaluator ID or judge criteria from your Patronus account alongside the presets below:
- Detect PII: the
piievaluator flags personally identifiable information in prompts or completions - Detect Toxicity: the
toxicity-perspective-apievaluator screens for harmful content - Prompt Injection: the
judgeevaluator withpatronus:prompt-injectioncriteria catches instruction-override attempts - Answer Refusal: the
judgeevaluator withpatronus:answer-refusalcriteria - Bias checks:
patronus:no-age-bias,patronus:no-gender-bias, andpatronus:no-racial-biascriteria - Response quality:
patronus:is-concise,patronus:is-helpful, andpatronus:is-politecriteria - Structured output validity:
patronus:is-json,patronus:is-code, andpatronus:is-csvcriteria
For checks that are specific to your domain, select a custom evaluator and supply your own evaluator ID and optional criteria. This is the path for hallucination and groundedness evaluators tuned to your retrieval context, since those criteria live in your Patronus account rather than in a fixed preset. The combination lets a single rule run a general safety screen and a domain-specific factuality check in one pass.
Best Practices for LLM Safety Evaluation at Scale
Running evaluation at the gateway introduces latency and cost, so configure rules to match the risk of each route. A few practices keep Patronus checks effective without slowing every request.
Use the sampling_rate field to evaluate a representative fraction of low-risk traffic while screening 100% of high-risk routes such as customer-facing responses. Scope rules with CEL expressions so that expensive evaluators run only where they matter, for example applying factuality checks to a retrieval-augmented endpoint and skipping them on internal tooling.
Control what Patronus stores with the capture mode. The default none keeps evaluation results out of the Patronus dashboard, fails-only captures failed results for audit and debugging, and all records every evaluation. Pair fails-only capture with Bifrost audit logs to build an immutable record of guardrail interventions for SOC 2, GDPR, and HIPAA reviews. Set the timeout to a value that bounds evaluator runtime so a slow provider call does not stall the request pipeline.
For regulated and high-volume environments, layer Patronus evaluation with Bifrost's other governance controls. Secrets detection catches API keys and credentials in prompts, and custom regex guardrails enforce organization-specific redaction patterns.
Because Bifrost supports in-VPC and air-gapped deployments, evaluation traffic and audit data can stay inside controlled infrastructure, and data access control governs which teams reach which providers.
Monitoring guardrail outcomes through Bifrost observability surfaces which rules intervene most often and where unsafe output concentrates across supported providers.
Getting Started with Bifrost
Hallucination detection and LLM safety evaluation work best when they run consistently across every model your applications call, not inside one service at a time. Setting up Patronus AI in the open-source Bifrost gateway gives platform teams a centralized enforcement point for PII, toxicity, prompt injection, and factuality checks, with rule-level control over what runs and when. To see how Bifrost handles guardrails and enterprise governance for your AI workloads, book a demo with the Bifrost team, or explore the Bifrost resources hub for deeper implementation guides.