AI Governance

Configuring Azure AI Content Safety in Bifrost

Configure Azure AI Content Safety in Bifrost to enable Jailbreak Shield, Indirect Prompt Injection Shield, and severity-based content filtering at the gateway.

Prompt injection ranks as the highest-priority security risk for LLM applications in the OWASP Top 10 for 2025, covering both direct attacks in user input and indirect attacks hidden in retrieved content. Enforcing content safety inside each application leaves gaps every time a new service ships with a different filter version, and policy coverage drifts across teams. Bifrost, the open-source AI gateway built in Go by Maxim AI, moves content safety into the gateway, where every model call across every service inherits the same policy. This guide covers how to configure Azure AI Content Safety in Bifrost, including Jailbreak Shield, the Indirect Prompt Injection Shield, and severity filtering.

What Azure AI Content Safety Detects

Azure AI Content Safety is Microsoft's content moderation service that classifies text across four harm categories (hate, sexual, violence, and self-harm) and detects prompt injection attacks. In Bifrost, it runs as a guardrail profile that validates prompts and responses before they reach a model or return to a user.

As a guardrail profile, Azure Content Safety provides:

Severity-based content moderation across hate, sexual, violence, and self-harm, with a configurable threshold of low, medium, or high.
Jailbreak Shield, Microsoft's Prompt Shield for detecting direct prompt injection attempts in user input.
Indirect Prompt Injection Shield, which detects malicious instructions embedded in external content such as documents, tool outputs, and retrieved passages.
Custom blocklists for organization-specific terms, plus optional copyright detection on outputs.

Azure Content Safety focuses on content moderation and prompt-attack defense. It does not perform PII detection or output redaction, so teams that need those controls pair it with AWS Bedrock Guardrails or a regex profile on the same request. Detailed capability breakdowns are available on the Bifrost guardrails resource page.

How Bifrost Structures Guardrails: Rules and Profiles

Bifrost builds guardrails on two primitives: profiles and rules. A profile defines how content is evaluated, and a rule defines when and what gets evaluated. This separation lets a platform team configure credentials once and reference them from any number of downstream policies.

Profiles are provider configurations. An Azure Content Safety profile holds the endpoint, authentication, and detection settings. Profiles are reusable across rules.
Rules are CEL (Common Expression Language) expressions that determine when a check fires. A rule sets apply_to to input, output, or both, and links one or more profiles. A single rule can link multiple profiles, which run in parallel.

This composability is the basis of defense in depth at the gateway. The full schema for guardrail configuration in config.json documents every provider and rule field. Guardrails are an enterprise capability and require the enterprise Bifrost image.

Configuring an Azure AI Content Safety Profile

To use Azure Content Safety in Bifrost, create an azure guardrail provider profile. The profile points at your Azure Content Safety resource endpoint and enables the specific shields and thresholds you want enforced.

{
  "guardrails_config": {
    "guardrail_providers": [
      {
        "id": 3,
        "provider_name": "azure",
        "policy_name": "azure-content-safety",
        "enabled": true,
        "timeout": 10,
        "config": {
          "endpoint": "env.AZURE_CONTENT_SAFETY_ENDPOINT",
          "auth_type": "api_key",
          "api_key": "env.AZURE_CONTENT_SAFETY_KEY",
          "analyze_enabled": true,
          "analyze_severity_threshold": "medium",
          "jailbreak_shield_enabled": true,
          "indirect_attack_shield_enabled": true,
          "copyright_enabled": false,
          "text_blocklist_enabled": false,
          "sampling_rate": 100
        }
      }
    ]
  }
}

The profile supports three authentication modes: api_key (the default), entra_id for a service principal with client_id, client_secret, and tenant_id, and default_credential for managed identity or Azure CLI with no credentials in config. Managed identity is the cleaner option for Azure-hosted or in-VPC deployments, since no secret has to be stored or rotated. Credential and endpoint fields accept env.VAR_NAME references that Bifrost resolves from the process environment at startup.

Three settings control detection behavior. analyze_enabled turns on severity-based text analysis, jailbreak_shield_enabled turns on direct prompt injection detection, and indirect_attack_shield_enabled turns on detection of hidden instructions in external content. All three default to their safe states, with analyze_enabled on by default and both shields off until explicitly enabled. Guardrails are part of the Bifrost Enterprise feature set.

Jailbreak Shield: Blocking Direct Prompt Injection

Jailbreak Shield detects direct prompt injection, where a user attempts to override the system instructions with input such as "ignore your previous instructions." Setting jailbreak_shield_enabled to true enables Microsoft's Prompt Shield on the prompt, and a rule scoped to input runs the check before the request is forwarded to a provider.

A rule that applies the Azure profile to incoming prompts looks like this:

{
  "guardrails_config": {
    "guardrail_rules": [
      {
        "id": 101,
        "name": "jailbreak-shield-input",
        "description": "Block direct prompt injection in user prompts",
        "enabled": true,
        "cel_expression": "true",
        "apply_to": "input",
        "sampling_rate": 100,
        "timeout": 10,
        "provider_config_ids": [3]
      }
    ]
  }
}

Direct prompt injection is the most common form of the OWASP LLM01 risk, and Microsoft documents the detection model behind it in its overview of Azure Prompt Shields. Enforcing the check at the gateway means every application pointed at Bifrost inherits jailbreak detection without changing application code.

Indirect Prompt Injection Shield: Catching Hidden Instructions

The Indirect Prompt Injection Shield detects malicious instructions embedded in content the model processes but the user did not type directly, such as a retrieved document, a web page, or a tool result. Setting indirect_attack_shield_enabled to true enables this detection on input validation, which is the relevant defense for RAG pipelines and agentic workflows.

Indirect injection is harder to catch than direct injection because the payload arrives through trusted-looking data rather than the user message. An attacker plants instructions in a document that a retrieval step later pulls into context, and the instructions execute when the model reads them. This maps to OWASP LLM08 vector and embedding weaknesses as well as the indirect-injection half of LLM01.

For agent stacks that call external tools, this matters most at the point where tool output and retrieved passages re-enter the model as input. Used as an MCP gateway, Bifrost centralizes tool connections and can apply the same input content-safety rules to that context. Because the shield validates inputs, it inspects retrieved and tool-derived content as it flows into the request, giving agents a consistent boundary against hidden instructions regardless of which service makes the call.

Severity Filtering for Content Moderation

Severity filtering controls how aggressively Azure Content Safety blocks moderated content. The analyze_severity_threshold field accepts three values, and a request is flagged when detected content meets or exceeds the threshold across any of the four harm categories.

low (blocks severity 2 and above): the most aggressive setting, flagging content from low severity upward.
medium (blocks severity 4 and above): the default, balancing precision and coverage for most production traffic.
high (blocks severity 6 only): the most permissive setting, flagging only the most severe content.

A single threshold rarely fits every workload. Customer-facing chat usually warrants a lower threshold than an internal engineering assistant. Because profiles are reusable and rules carry CEL expressions, Bifrost supports separate profiles at different thresholds, each linked to a rule that matches by model, team, or another request attribute. A rule with cel_expression set to model == 'gpt-4o' applies one profile to that model, while a broader rule applies a different threshold elsewhere.

Layering Azure Content Safety for Defense in Depth

No single guardrail provider catches every failure mode, which is why a Bifrost rule can link multiple profiles that run in parallel on the same request. A common pattern pairs Azure Content Safety with GraySwan for content and jailbreak coverage, and AWS Bedrock with Patronus AI for PII and hallucination defense on regulated workflows.

Because Azure Content Safety does not detect PII, teams that need both moderation and PII redaction layer it with a Bedrock profile or a regex profile. Bifrost applies the same policy across all configured providers, so the controls hold whether traffic routes to OpenAI, Anthropic, or Azure OpenAI.

For regulated industries, every guardrail decision is logged with the violation type, severity, action, and processing latency, producing the audit trail that compliance reviews expect.

Healthcare and other regulated teams can review the healthcare and life sciences deployment patterns and the governance resource page for centralized policy and access control. Layering providers at the gateway is the model the open-source Bifrost gateway is designed to run.

Common Questions About Azure AI Content Safety in Bifrost

Does Azure AI Content Safety detect PII in Bifrost?

No. Azure Content Safety in Bifrost covers content moderation, jailbreak detection, and indirect prompt injection detection, but not PII detection or output redaction. Pair it with an AWS Bedrock profile, a Patronus AI profile, or a regex profile for PII coverage on the same request.

Can I run Azure AI Content Safety on both inputs and outputs?

Yes, with one distinction. Severity-based content analysis runs on both inputs and outputs, so set the rule's apply_to field to both to moderate prompts and completions. Jailbreak Shield and the Indirect Prompt Injection Shield apply to input validation only, and copyright detection applies to outputs only. Bifrost runs input rules before forwarding the request to the provider and output rules after the provider responds.

What severity levels does Bifrost support for Azure content filtering?

Bifrost supports three severity thresholds through the analyze_severity_threshold field: low, medium, and high. Medium is the default. Lower thresholds block more content; higher thresholds block only the most severe.

Is Azure AI Content Safety available in the open-source gateway?

Guardrails, including Azure Content Safety, are an enterprise feature and require the enterprise Bifrost image. The core gateway, routing, and provider access remain open source.

Getting Started with Bifrost

Configuring Azure AI Content Safety in Bifrost gives every application behind the gateway consistent Jailbreak Shield, Indirect Prompt Injection Shield, and severity filtering without per-service integration work. Define the profile once, link it to rules scoped by input, output, model, or team, and layer it with other providers for defense in depth. To see how Bifrost can centralize content safety and governance across your AI infrastructure, book a demo with the Bifrost team.