Google Model Armor Guardrails with Bifrost

Google Model Armor Guardrails with Bifrost
Bifrost lets enterprise teams enforce Google Model Armor guardrails inline, blocking prompt injection and redacting sensitive data on every LLM request.

Prompt injection is the top security risk for LLM applications in the OWASP Top 10 for LLM Applications, and sensitive information disclosure now ranks second. Bifrost, the open-source AI gateway built in Go by Maxim AI, lets enterprise teams enforce Google Model Armor guardrails inline, blocking policy violations before a prompt reaches a model and redacting sensitive data before a response returns to a user. This guide covers how Google Model Armor guardrails work in Bifrost Enterprise, how to configure them, and how Bifrost handles each policy outcome.

What Is Google Model Armor?

Google Model Armor is a fully managed Google Cloud service that screens LLM prompts and responses for security and safety risks. It detects prompt injection and jailbreak attempts, filters harmful content, flags malicious URLs, and applies Sensitive Data Protection to detect or de-identify sensitive information across any model on any cloud.

Model Armor policies are defined in templates that you manage in Google Cloud. Each template combines filters and confidence thresholds for the categories you care about:

  • Prompt injection and jailbreak detection: catches attempts to override system instructions or bypass safety controls.
  • Responsible AI safety filters: screens for hate speech, harassment, sexually explicit content, and dangerous content, with child sexual abuse material applied by default.
  • Malicious URL detection: flags phishing and malware links embedded in prompts or responses.
  • Sensitive Data Protection (SDP): inspects for PII and credentials, and can de-identify text by redacting or replacing matched values.

Because Model Armor is model-agnostic, the same template protects traffic regardless of which provider serves the request. Bifrost extends that reach by routing 1000+ models through a single API while applying the same guardrail policy to all of them.

Why LLM Guardrails Matter for Production AI

LLM guardrails are runtime controls that validate inputs and outputs against safety and security policies before content moves through your application. They address the failure modes that make production AI risky: manipulated prompts, leaked data, and harmful generated content.

The OWASP Top 10 for LLM Applications ranks prompt injection as the number one risk for the second consecutive edition, and notes that sensitive information disclosure climbed from sixth to second place in the 2025 list as agentic systems gained access to more data. These risks carry direct business consequences. A successful prompt injection can trigger unauthorized actions, and a single leaked response can expose PII, credentials, or proprietary data that creates regulatory and financial liability.

Guardrails reduce this exposure by inspecting traffic at runtime rather than relying on model behavior alone. Enforcing them at the gateway, rather than inside each application, gives platform teams one consistent policy boundary across every model, team, and environment.

How Bifrost Enforces Google Model Armor Guardrails

Bifrost is the gateway enforcement path for Google Model Armor. Google owns the Model Armor template; the Bifrost AI gateway decides when to call Model Armor, sends the relevant text to the template, and then blocks or rewrites the request or response based on the result. This keeps policy configuration in Google Cloud while enforcement happens inline at the gateway.

Guardrails in Bifrost are built around two reusable concepts:

  • Profiles define how content is evaluated. A Model Armor profile holds the Google Cloud project, location, template, and authentication settings. Profiles can be shared across many rules.
  • Rules define when and what content is evaluated. Rules use Common Expression Language (CEL) expressions and apply to inputs, outputs, or both, with a configurable sampling rate for high-traffic endpoints.

When an input rule matches, Bifrost sends the prompt text to Model Armor's sanitizeUserPrompt operation. When an output rule matches, Bifrost sends the response text to sanitizeModelResponse. This dual-stage validation pairs naturally with the rest of the governance layer in Bifrost, where virtual keys already scope budgets, rate limits, and access per team or customer. A single rule can link multiple profiles, so Model Armor can run alongside Bifrost-native secrets detection and custom regex checks for defense in depth.

Setting Up Google Model Armor in Bifrost

Configuring Google Model Armor guardrails in Bifrost takes three stages: prepare Google Cloud, choose an authentication mode, then create the profile and attach it to a rule. The integration requires Bifrost Enterprise with the guardrails plugin enabled.

Before you start, confirm these prerequisites:

  • The Model Armor API is enabled in your Google Cloud project.
  • A Model Armor template exists in the project and location you plan to use.
  • Network egress is allowed from Bifrost to the Model Armor regional endpoint over HTTPS.
  • A Google principal holds roles/modelarmor.user or a higher Model Armor role.

In the Google Cloud console, enable the Model Armor API, create a template under Security, and note the three values Bifrost needs: project ID, location (for example us or us-central1), and template ID. Then grant the identity Bifrost runs as the Model Armor User role.

Choosing an Authentication Mode

Bifrost supports two OAuth-based Google authentication modes for the Google Model Armor integration:

  • Google ADC (default_credential): Bifrost uses Application Default Credentials resolved from the runtime environment, such as an attached service account, Workload Identity on GKE, or GOOGLE_APPLICATION_CREDENTIALS. No key JSON is stored in the profile.
  • Service Account Key JSON (service_account_json): the profile carries one specific service account key, pasted directly or referenced from an environment variable.

Use ADC when you want the guardrail to inherit credentials from the deployment environment, which fits in-VPC and Workload Identity setups well. Use the key JSON mode when a profile should authenticate as one explicit service account.

Creating the Profile and Rule

Create a profile with provider_name: "model-armor", then attach it to a rule using its configuration ID. The example below uses ADC and runs Model Armor on input prompts for a specific model:

{
  "guardrails_config": {
    "guardrail_providers": [
      {
        "id": 80,
        "provider_name": "model-armor",
        "policy_name": "model-armor-prod",
        "enabled": true,
        "timeout": 30,
        "config": {
          "auth_type": "default_credential",
          "project_id": "env.GCP_PROJECT_ID",
          "location": "env.GCP_LOCATION",
          "template_id": "env.GMA_TEMPLATE_ID"
        }
      }
    ],
    "guardrail_rules": [
      {
        "id": 801,
        "name": "model-armor-gpt-input",
        "enabled": true,
        "cel_expression": "model == 'gpt-5.4'",
        "apply_to": "input",
        "sampling_rate": 100,
        "timeout": 60,
        "provider_config_ids": [80]
      }
    ]
  }
}

The same profile can be configured through the Bifrost dashboard under Guardrails, through the management API, or through Helm. CEL expressions let you scope rules precisely: apply to user messages only, to prompts above a length threshold, or to specific models. The required configuration fields are project_id, location, and template_id; auth_type, service_account_json, base_url, and timeout (default 30 seconds) are optional.

Policy Outcomes and Best Practices

Bifrost follows the Model Armor template result on every evaluated request:

  • No match found: Bifrost allows the original content unchanged.
  • Blocking match (responsible AI, prompt injection, malicious URI, or SDP inspect-only): Bifrost returns GUARDRAIL_INTERVENED and responds with HTTP 400 and a guardrail_intervention error that names the matched filter.
  • SDP de-identify match: Bifrost replaces the original text with the transformed, de-identified text and lets the request or response continue.
  • Provider failure (timeout, non-2xx, or malformed response): Bifrost treats the call as failed and records the error in logs.

Model Armor output inspection and de-identification apply to non-streaming response bodies today; input guardrails still run before a streaming request is sent to the model. Bifrost records Model Armor usage metadata, including evaluated text count, matched filters, and transformed text count, which you can export through OpenTelemetry tracing and retain in immutable audit logs for SOC 2, GDPR, and HIPAA evidence.

For production deployments, a few practices keep guardrails reliable and cost-aware:

  • Apply least-privilege IAM: grant only roles/modelarmor.user to the Bifrost runtime identity.
  • Use sampling on high-traffic rules: evaluate a percentage of requests to balance coverage against latency.
  • Layer providers for defense in depth: combine Model Armor with secrets detection and custom regex on the same rule path.
  • Keep policy in Google Cloud: manage template thresholds centrally while Bifrost Enterprise enforces them at the edge, including in in-VPC deployments with no public network egress.

Centralizing guardrails this way is why Bifrost fits regulated industries and large platform teams: one governance and security boundary covers every model and provider behind the gateway.

Common Questions About Google Model Armor Guardrails

Does Google Model Armor work with non-Google models in Bifrost?

Yes. Model Armor is model-agnostic, and Bifrost routes 1000+ models through a single API. A Model Armor rule applies the same template policy regardless of whether the request is served by OpenAI, Anthropic, Gemini, or any other supported provider.

Can Bifrost redact sensitive data instead of blocking the request?

Yes. When a Model Armor template uses Sensitive Data Protection de-identification, Bifrost replaces the matched text with the transformed output and allows the request or response to continue, rather than blocking it outright.

Does the Model Armor guardrail support streaming responses?

Output inspection and de-identification currently apply to non-streaming response bodies. Input guardrails still run before a streaming request reaches the model, so prompts are screened in both modes.

Is Google Model Armor available in open-source Bifrost?

The Google Model Armor integration is part of Bifrost Enterprise and requires the guardrails plugin. The open-source gateway provides routing, failover, and provider access; enterprise guardrails add the inline content safety and policy enforcement layer.

Getting Started with Google Model Armor Guardrails in Bifrost

Google Model Armor guardrails give enterprise teams a way to enforce Google Cloud safety and data protection policies on every LLM request, without rewriting application code. With Bifrost as the enforcement path, the same prompt injection, content safety, and Sensitive Data Protection rules apply consistently across every model and environment, alongside the routing, governance, and observability already running through the gateway. Explore the full set of guardrail and governance resources to see how the pieces fit together.

To see how Bifrost can enforce Google Model Armor guardrails across your AI infrastructure, book a demo with the Bifrost team.