How to Stop Your AI From Leaking PII or Unsafe Output
Customer-facing AI applications fail in two costly ways: they return personally identifiable information (PII) that should never leave the system, and they produce unsafe content that breaches policy or damages trust. In the 2025 OWASP Top 10 for LLM Applications, sensitive information disclosure rose from sixth to second place, reflecting how often production systems expose PII through model outputs. The reliable way to stop your AI from leaking PII or unsafe output is to validate every prompt and response at a single control point instead of patching each application separately. Bifrost, the open-source AI gateway built in Go by Maxim AI, enforces PII detection, content safety, and policy controls inline for every model call. This post covers how those guardrails work and where to place them for customer-facing systems.
Why AI Applications Leak PII and Produce Unsafe Output
PII leaks and unsafe responses come from two distinct points in the request lifecycle:
- Input side: A user, an upstream system, or a retrieved document places sensitive data into the prompt. That data is then sent to an external provider, logged, and sometimes echoed back. Prompt injection can also coax a model into revealing system instructions or other users' data.
- Output side: The model returns content that violates policy. This includes regurgitated training data containing real names, emails, or phone numbers, as well as harmful categories such as hate speech, sexual content, self-harm, and profanity reaching a customer.
Language models reproduce sensitive data through both verbatim memorization, where they emit strings directly from training data, and semantic memorization, where they restate the same private information in different words. Because output is non-deterministic, no system prompt instruction fully prevents this. The controls have to inspect the actual text moving in and out of the model.
Most teams first try to solve this inside each service with a filtering library. Within a few months the same problems appear: enforcement fragments as new microservices ship different filter versions, guardrail credentials sprawl across services, and audit evidence becomes inconsistent because compliance reviews require pulling traces from every service individually. A control that lives in application code is only as strong as the least-updated service that uses it.
Application-Level Filters vs Gateway-Level Guardrails
There are two places to enforce safety: inside each application, or at the gateway every request already passes through.
Application-level filtering keeps logic close to the code but multiplies the work. Each team integrates, versions, and tests its own checks, and a single missed integration becomes a gap. Gateway-level enforcement inverts this. The gateway sits between your applications and every LLM provider, so a policy configured once applies to all traffic regardless of which team or service originated the request.
Bifrost runs as a drop-in replacement for the OpenAI, Anthropic, and other major SDKs, so applications inherit guardrails by changing only the base URL. Validation happens inline as part of the request and response pipeline, with no separate network hop to a standalone filtering service. This is the practical mechanism that lets a small platform team enforce one safety standard across dozens of applications.
How Bifrost Stops PII Leaks and Unsafe Output at the Gateway
Bifrost validates inputs and outputs in real time against your policies, and blocks, redacts, or modifies content before a request reaches a provider or a response returns to a user. The guardrails system is built on two reusable concepts:
- Rules define when and what to check. They use Common Expression Language (CEL) and apply to inputs, outputs, or both.
- Profiles define how content is checked and which provider runs the check. A single rule can link to multiple profiles, which is the basis for layering checks on one request.
Two checks run natively inside the gateway and require no external service. Custom Regex includes a built-in PII Detection template for deterministic patterns such as email addresses and Social Security numbers, and Secrets Detection catches leaked API keys, tokens, and credentials before they reach a model.
For deeper coverage, the gateway guardrails layer connects to external providers behind one configuration interface, including AWS Bedrock Guardrails, Azure Content Safety, Google Model Armor, CrowdStrike AIDR, GraySwan Cygnal, and Patronus AI. The AWS Bedrock profile detects and redacts more than 50 types of sensitive information, including SSNs, credit cards, addresses, medical records, and device identifiers. Content moderation, prompt injection defense, and toxicity screening are available across these providers, so a customer-facing endpoint can enforce PII redaction and content filtering on the same call.
Building a PII and Safety Guardrail Step by Step
A practical configuration for customer-facing AI uses two rules: one that validates user prompts and one that validates model responses. The first prevents sensitive data and injection attempts from reaching the provider; the second scans every response before it returns to the customer.
{
"guardrails_config": {
"guardrail_providers": [
{
"id": 1,
"provider_name": "regex",
"policy_name": "PII Detection",
"enabled": true,
"config": {
"patterns": [
{ "pattern": "\\b\\d{3}-\\d{2}-\\d{4}\\b", "description": "US Social Security Number" }
]
}
},
{
"id": 2,
"provider_name": "bedrock",
"policy_name": "PII and Content Filtering",
"enabled": true,
"config": { "region": "us-east-1" }
}
],
"guardrail_rules": [
{
"id": 1,
"name": "Block PII in Prompts",
"enabled": true,
"cel_expression": "request.messages.exists(m, m.role == \"user\")",
"apply_to": "input",
"provider_config_ids": [1, 2]
},
{
"id": 2,
"name": "Filter Responses to Customers",
"enabled": true,
"cel_expression": "true",
"apply_to": "output",
"provider_config_ids": [2]
}
]
}
}
When a check fails, the response makes the decision explicit. A blocked request returns a 446 status with the violation type, category, and severity, so the calling application can handle it without guessing. A logged or redacted response returns a 246 status, where unsafe spans are removed but the rest of the answer reaches the customer. This separation lets teams hard-block high-severity categories such as exposed SSNs while redacting lower-severity issues such as profanity.
To layer protection on a high-risk endpoint, link several guardrail profiles to a single rule. A common pattern runs Bedrock and Patronus AI together for PII, and Azure plus GraySwan for content and jailbreak protection. Because profiles are reusable, you configure each provider's credentials once and reference them from any rule that needs them, which keeps enforcement consistent as new applications come online.
Compliance and Defense-in-Depth for Customer-Facing AI
Guardrails are only credible to auditors if they cannot be bypassed by routing around the control or losing the evidence. Frameworks such as the NIST AI Risk Management Framework expect organizations to govern, measure, and manage AI risk continuously, not to rely on a one-time configuration. Centralizing enforcement at the gateway supports this because every model call inherits the same policy and produces the same record.
The Bifrost gateway writes every guardrail evaluation, blocked request, and redaction to immutable audit logs suitable for SOC 2, GDPR, HIPAA, and ISO 27001 evidence. For teams that cannot let request bodies leave their network, in-VPC deployment runs the gateway, guardrail profiles, and logs entirely inside a private cloud or Kubernetes cluster, so detection events stay within the customer perimeter. These controls are part of the Bifrost Enterprise capability set built for regulated and high-scale environments.
The combination matters most in verticals where a single leaked record carries regulatory weight. Healthcare teams handling protected health information can review Bifrost's approach to healthcare AI infrastructure for compliance-specific deployment patterns. Pairing PII redaction, content moderation, and a unified governance layer gives a customer-facing system defense-in-depth rather than a single point of failure.
Getting Started With Bifrost
To stop your AI from leaking PII or unsafe output, route traffic through a gateway that validates every prompt and response, then configure input and output rules backed by the providers that fit your threat model. Bifrost provides this enforcement inline, with native PII and secrets detection, multi-provider guardrails, and immutable audit trails, all behind a single OpenAI-compatible API. Explore the full set of safety and governance capabilities in the Bifrost resources hub, or book a demo with the Bifrost team to map guardrails to your specific compliance requirements.