Guardrails

Guardrails at the Gateway: What Bifrost Ships and What It Integrates

Madhu Shantan

May 24, 2026 · 7 min read

Introduction

Every team running LLMs in production runs into the same surface area sooner or later. Prompt injection. PII leaking into a model provider's training pipeline. Credentials pasted into a chat by accident. Toxic or off-policy responses going back to a user. Hallucinated content quoted as fact. Most teams solve these problems the first time they hit them, usually inside the application itself, with a one-off regex, a hand-rolled OpenAI moderation call, or a wrapper around a vendor SDK. That solution rarely survives the second team picking it up, and it never survives a model swap.

The cleaner place to handle this is the gateway. Bifrost sits between every application and every provider, which makes it the natural enforcement point for content policy. Over the last couple of weeks, we shipped a meaningful expansion of what that looks like: Secrets Detection and Custom Regex as native, in-process guardrails, plus three new first-class third-party providers (Patronus AI, CrowdStrike AIDR, and Google Model Armor). We also extended streaming inspection on both input and output to our Guardrail Providers. This post is a tour of what's actually in there.

What a guardrail at the gateway actually does

A guardrail in Bifrost does one of three things: it blocks a request, it rewrites the content, or it logs and lets the traffic through. The first two are enforcement. The third is observability, which matters more than people expect. Most teams don't know what's in their AI traffic until they look.

Guardrails run in two phases. Input validation looks at the prompt before it leaves Bifrost for the model provider. Output validation looks at the response before it goes back to the caller. You pick the phase per rule with an apply_to setting: input only / output only or both. That separation matters because you might want strict input checks to keep PII out of your model providers, but lighter output checks that only redact rather than block, so user requests don't fail.

The configuration model has two pieces: Rules and Profiles. A Profile is the provider configuration: which guardrail engine to use, with what credentials, against what policy. A Rule decides when to run that profile and which phase to run it on. Rules are written in CEL (Common Expression Language), so you can scope them to specific models, headers, user types, or message contents. One rule can chain multiple profiles for layered checks. The same profile can be reused across rules so the same secrets policy applies to ten different gateways without ten separate configs.

There are also operational knobs (sampling rate per rule, execution timeout) so you can dial in guardrails for high-traffic endpoints without paying full latency on every request.

Native guardrails: detection without an outbound call

Two of the providers live entirely inside Bifrost. They don't require an API key, don't make outbound HTTP calls, and don't add any per-request cost beyond local CPU. That matters in a few places: regulated environments where outbound traffic is a problem, latency-sensitive endpoints where 200ms of round trip is noticeable, and self-hosted deployments where you'd rather not depend on another vendor.

Secrets Detection uses the embedded default Gitleaks rule set, covering 222 detection rules across most of the credential families you'd expect. AWS access keys, GitHub PATs, GitLab tokens, OpenAI and Anthropic keys, Stripe secrets, Slack tokens, HashiCorp Vault tokens, Kubernetes secret YAML, private key blocks, and a long tail of vendor-specific tokens. You enable it by creating a guardrail profile with provider_name: "secrets" and attaching it to a rule. There's nothing else to configure unless you have known false positives, in which case there's an ignored_secret_keywords allowlist that suppresses matches containing those substrings. On the first detected secret, Bifrost returns GUARDRAIL_INTERVENED with a reason like secret detected : github-pat. We chose to make it intervene on the first match rather than enumerate everything in the prompt, because in practice the action is the same either way.

Custom Regex is the more general escape hatch. It evaluates request and response text against patterns you define, using Go's RE2-compatible engine. No lookaheads, no backreferences, but everything else you'd want for organization-specific identifiers, internal project codenames, or environment-specific tokens. The UI ships a PII Detection template that pre-fills five common patterns: email addresses, US phone numbers, US Social Security Numbers, credit card-like 13-19 digit sequences, and IPv4 addresses. The template is honest about what it is: pattern-based, not semantic. Expect some false positives on credit card-shaped numbers that aren't credit cards, and expect to miss anything outside the US format. For national IDs in other locales or strictly enforced formats, you write your own pattern alongside the template.

The line between the two is straightforward. Secrets Detection is the broad credential coverage you'd get from running Gitleaks on a codebase, repurposed for prompts and responses. Custom Regex is for everything else you can describe with a deterministic pattern.

External providers: enforcement for policy you already own

Six guardrail providers in Bifrost are external services, sitting behind an API call. The pattern across all of them is the same: the provider owns the detection policy, Bifrost owns the inline enforcement path. If your security team has already standardized on one of these vendors, the integration means you don't have to rebuild that policy at the gateway layer.

AWS Bedrock Guardrails covers PII detection, content filtering, prompt attack prevention, toxicity screening, and image content evaluation. It's the only first-class integration in Bifrost with image support today. You point Bifrost at a Bedrock guardrail ARN and version, and we call it on every matched request.

Azure Content Safety handles multi-modal text moderation with severity-based filtering, plus prompt shields for jailbreak and indirect prompt injection (IPI) detection. It's a strong default if your stack already lives in Azure.

GraySwan Cygnal takes a different shape from the others. Instead of pre-defined detection categories, you write rules in plain English: "Do not allow profanity," "Ensure responses maintain a professional tone," "Block any mention of customer order IDs." GraySwan scores each request on a 0-1 violation scale, and you set the threshold. It also supports IPI detection and content mutation detection. For policies that are hard to express as a category but easy to write as a sentence, this is the path.

Patronus AI is new in this release. It's evaluator-based: you configure one or more Patronus evaluators (PII, toxicity, prompt injection, custom judges) and Bifrost calls them via the v1/evaluate endpoint. Patronus is the only provider in the current lineup that does hallucination detection, and it's also the broadest on response-quality checks: presets for conciseness, helpfulness, politeness, no apologies, no OpenAI reference, age/gender/racial bias checks, plus structural validators for JSON, code, and CSV outputs. You can capture results back into the Patronus dashboard if you want the audit trail to live there.

CrowdStrike AIDR is also new. It targets organizations that already manage AI security policy in CrowdStrike Falcon. You configure an AIDR collector in the Falcon console, point Bifrost at it with the collector token, and Bifrost sends an OpenAI-shaped guard_input payload to AIDR for evaluation. AIDR's verdict drives Bifrost's behavior: if blocked : true, Bifrost returns GUARDRAIL_INTERVENED; if transformed : true with new content, Bifrost applies the rewrite. The detection categories include prompt injection, jailbreak attempts, sensitive data and PII, and custom entities, with full telemetry flowing back into the CrowdStrike AIDR console.

Google Model Armor is the third new integration. If your safety policy lives in Google Cloud, Model Armor is the path. You create a Model Armor template in your GCP project and point Bifrost at it. Bifrost calls sanitizeUserPrompt for input phases and sanitizeModelResponse for output phases. Model Armor covers prompt injection, Responsible AI (RAI) categories (hate, harassment, sexually explicit, dangerous content), malicious URI detection, and Sensitive Data Protection inspection and de-identification. The SDP piece is the interesting one. Model Armor can return de-identified text and Bifrost will swap it into the request transparently. Authentication is either ADC or a service account key JSON, whichever fits your runtime.

Streaming, where output inspection actually got harder

Guardrails on streaming traffic are tricky. The input side is the easier half: even when the response is streamed, the prompt itself is still a single payload you can inspect before forwarding to the model. The output side is where it gets awkward, because you've already started sending tokens to the client by the time you can evaluate any meaningful chunk. The dominant client pattern for chat traffic is streamed completions, so "output guardrails only work on non-streaming" effectively means "output guardrails don't run on most of your traffic."

In this release, we extended streaming inspection across CrowdStrike AIDR, GraySwan Cygnal, Google Model Armor, and Patronus AI on both phases. Inputs are evaluated before the streaming request goes to the model, and outputs are now evaluated as the stream comes back, so a streamed completion can be blocked when a violation is detected.

One caveat worth being explicit about: streaming output redaction and transformation is not supported yet. If a provider returns transformed text (Model Armor's SDP de-identification or AIDR's transformed : true rewrite), Bifrost only applies the rewrite on non-streaming responses today. Streaming inspection is currently block-or-pass; transform support on streaming is queued, not shipped.

Picking the combination

The fastest way through the matrix: start native, add external where you need semantics.

If you only care about credential leakage, Secrets Detection alone is enough and costs you nothing. If you have well-defined patterns to enforce, Custom Regex covers it. For prompt injection or content policy, the external providers earn their keep. Bedrock, Azure, GraySwan, and Patronus all do prompt injection competently; Model Armor adds RAI categories and SDP redaction. For hallucination detection specifically, Patronus is the only option in the current lineup. For organizations with existing AIDR or Model Armor policy, the integration is mostly about not duplicating work. Layering two profiles on a single rule is common: Secrets Detection plus a content guardrail, or Bedrock plus Patronus for defense in depth.

Conclusion

Most teams hit the same set of LLM safety problems, and most teams solve them the first time in the wrong layer. Bifrost's bet with guardrails is that an AI gateway is the natural place to enforce content policy, because it's the only layer that sees every request from every app to every model. Native detection covers the deterministic cases without an outbound call, and the external provider integrations cover the cases where someone else's policy is already authoritative. Prompt-based guardrails are next on the roadmap, for the cases where you'd rather describe the policy in plain language than write CEL or wire up an evaluator. The deeper point isn't which detector is best. It's that the right place to run them is the layer that's already in the path, configured once, applied uniformly. If your guardrail logic lives inside your app code, you're going to rebuild it the next time the model changes.

For full configuration, CEL examples, and the per-provider setup details, see the Bifrost Guardrails documentation.