Natural-Language Safety Rules with GraySwan in Bifrost

Natural-Language Safety Rules with GraySwan in Bifrost
Bifrost integrates GraySwan Cygnal so teams can define natural-language safety rules in plain English, with violation scoring and prompt-injection detection.

Prompt injection ranks as the number one security risk for large language model applications in the OWASP Top 10 for LLM Applications 2025. Most teams respond by writing detection logic in code and then maintaining it as attack patterns shift. Bifrost, the open-source AI gateway built in Go by Maxim AI, integrates GraySwan Cygnal as part of its enterprise guardrails so teams can express natural-language safety rules in plain English instead. This post explains how to configure GraySwan Cygnal in Bifrost, write effective rules, and combine them with other guardrails for layered protection.

What Are Natural-Language Safety Rules?

Natural-language safety rules are content-safety policies written as plain-English descriptions rather than code or regular expressions. A rule like "Do not allow personally identifiable information" is evaluated directly by a safety model, which scores each request and response against the stated intent. This removes the gap between a policy that a compliance team writes and the detection logic that an engineering team has to translate and maintain.

GraySwan Cygnal is the safety provider in Bifrost that supports this pattern. Each rule is a key-value pair: the key is a short rule name, and the value is a description of the behavior to allow or block. Cygnal returns a violation score on a continuous 0 to 1 scale, and Bifrost acts on that score against a configurable threshold.

Why Plain-English Safety Policies Matter for AI Teams

Safety policies in most AI applications drift away from the rules they are supposed to enforce. A policy document describes intent in prose, but the deployed check is a regex or a hardcoded filter that only approximates that intent. As models, prompts, and attack techniques change, the two diverge.

Natural-language rules close that gap for three reasons:

  • Readability: Policy owners, security reviewers, and auditors can read the exact rule the system enforces without parsing code.
  • Coverage: A safety model interprets intent and context, catching paraphrased or obfuscated violations that a static pattern misses.
  • Maintainability: Updating a rule means editing one sentence, not rewriting and redeploying detection logic.

This matters most in regulated environments. The NIST AI Risk Management Framework emphasizes documented, auditable controls over model behavior, and plain-English rules map cleanly to that requirement. Bifrost, the AI gateway built for enterprises and regulated industries, treats guardrails as a first-class layer of its enterprise deployment model rather than an afterthought.

How GraySwan Cygnal Works in Bifrost

GraySwan Cygnal in Bifrost evaluates inputs, outputs, or both against your defined rules and returns a violation score that Bifrost uses to pass, log, or block the request. The integration is configured through the guardrails system and runs inline with every request that the rule applies to.

Cygnal supports several detection capabilities through Bifrost:

  • Violation scoring: Continuous 0 to 1 scoring with a configurable threshold that controls how strict enforcement is.
  • Custom natural-language rules: Define safety policies in plain English as key-value pairs.
  • Policy management: Use pre-built policies from the GraySwan platform by ID, or define rules inline.
  • Indirect prompt injection (IPI) detection: Identify hidden instructions embedded in user input or retrieved content.
  • Mutation detection: Detect attempts to manipulate or alter content to evade filters.
  • Reasoning modes: Choose off for the fastest analysis, hybrid for a balance of speed and depth, or thinking for the most thorough evaluation.

In Bifrost, guardrails are built around two concepts that work together. A profile configures a provider such as GraySwan, including its credentials and detection settings. A rule uses a Common Expression Language (CEL) expression to decide when and what to evaluate, and links to one or more profiles. This separation lets a single GraySwan profile be reused across many rules, and lets one rule combine GraySwan with other providers for defense-in-depth.

Configuring GraySwan Cygnal in Bifrost

Configuring GraySwan Cygnal in Bifrost takes two steps: create a GraySwan profile with your rules, then link that profile to a guardrail rule that controls when it runs. You can do both from the Bifrost dashboard, the HTTP API, or a configuration file. The open-source Bifrost gateway keeps the same configuration model across all three.

Step 1: Create a GraySwan profile

A profile holds the GraySwan API key, the violation threshold, the reasoning mode, and the natural-language rules. The api_key field is required; violation_threshold defaults to 0.5, where lower values are stricter; and reasoning_mode defaults to off.

curl -X POST <http://localhost:8080/api/guardrails/grayswan> \\
  -H "Content-Type: application/json" \\
  -d '{
    "name": "Custom Safety Rules",
    "enabled": true,
    "config": {
      "api_key": "env.GRAYSWAN_API_KEY",
      "violation_threshold": 0.5,
      "reasoning_mode": "hybrid",
      "rules": {
        "no_pii": "Do not allow personally identifiable information",
        "professional_tone": "Ensure responses maintain a professional tone"
      }
    }
  }'

The API assigns a configuration ID after creation. Store the API key in an environment variable or a secrets manager rather than inline, in line with the governance practices Bifrost recommends for production deployments.

A rule decides when the GraySwan profile runs and what it evaluates. The applyTo field accepts input, output, or both. Profiles are referenced as "<provider-type>:<config-id>", so a GraySwan profile with ID 7 is referenced as grayswan:7.

curl -X POST <http://localhost:8080/api/guardrails/rules> \\
  -H "Content-Type: application/json" \\
  -d '{
    "name": "Apply Safety Rules to User Traffic",
    "enabled": true,
    "celExpression": "request.messages.exists(m, m.role == \\"user\\")",
    "applyTo": "both",
    "samplingRate": 100,
    "selectedGuardrailProfiles": ["grayswan:7"]
  }'

The same configuration can live in a config.json or Helm values file using snake_case fields (cel_expression, apply_to, provider_config_ids), which makes guardrails reproducible across environments. The sampling_rate field lets you evaluate a percentage of requests on high-traffic endpoints to control latency.

Step 3: Attach and verify guardrails on requests

Once a rule is active, Bifrost enforces it automatically for matching requests. A passing request returns 200 with a guardrails block in extra_fields showing the validation status. A blocked request returns 446 with the violation details, and a logged-only warning returns 246. These status codes let downstream services distinguish a clean response, a blocked one, and one that passed with a recorded warning.

Writing Effective Natural-Language Safety Rules

Effective natural-language safety rules are specific, scoped to one behavior, and named clearly. GraySwan evaluates each rule independently, so a focused rule produces a clearer violation description than a broad one. Define rules as key-value pairs where the key is the rule name and the value is the policy:

{
  "rules": {
    "no_profanity": "Do not allow profanity or vulgar language",
    "no_pii": "Do not allow personally identifiable information",
    "professional_tone": "Ensure all responses maintain a professional tone"
  }
}

A few practices keep rules reliable as your application scales with the Bifrost platform:

  • One behavior per rule: Split compound policies into separate rules so violation attribution is precise.
  • Tune the threshold per rule set: Start at the default 0.5 and lower it for stricter enforcement on sensitive endpoints.
  • Match the reasoning mode to the workload: Use off for latency-sensitive paths, hybrid for general traffic, and thinking for high-risk flows where thorough analysis matters more than speed.
  • Apply rules to the right stage: Use input to screen prompts, output to screen model responses, and both for end-to-end coverage.

How does GraySwan handle prompt injection in Bifrost?

GraySwan Cygnal includes indirect prompt injection detection, which identifies hidden instructions embedded in user input or in content the model retrieves. Combined with mutation detection, it flags attempts to manipulate content and bypass filters. Because prompt injection is the top-ranked LLM risk, screening inputs with applyTo: input is a common first deployment.

Can natural-language rules and pre-built policies run together?

Yes. A GraySwan profile can include inline natural-language rules and reference pre-built policies from the GraySwan platform by policy_id or policy_ids for aggregated evaluation. This lets teams reuse standardized policies while adding application-specific rules in plain English.

Layering GraySwan with Other Guardrails

GraySwan Cygnal works alongside the other guardrail providers in Bifrost rather than replacing them. Because rules and profiles are decoupled, one rule can link multiple profiles, and Bifrost runs them in sequence for layered protection. This defense-in-depth approach is the recommended pattern for production AI traffic on the Bifrost AI gateway.

Common pairings include:

  • GraySwan plus secrets detection: Catch leaked API keys and credentials alongside natural-language policy checks.
  • GraySwan plus custom regex: Combine deterministic pattern matching, including the built-in PII template, with model-based interpretation.
  • GraySwan for content safety plus a PII-focused provider: Use each provider for the capability it handles best.

Every guardrail decision is captured for compliance. Bifrost maintains immutable audit logs for SOC 2, GDPR, HIPAA, and ISO 27001 reporting, and supports in-VPC deployment so guardrail evaluation and request data stay inside your own infrastructure. For teams standardizing access and policy enforcement across projects, the governance resource hub covers how virtual keys, budgets, and guardrails fit together.

Getting Started with Bifrost

Natural-language safety rules let teams move policy enforcement out of scattered code and into readable, auditable rules that a safety model evaluates on every request. Configuring GraySwan Cygnal in Bifrost takes two steps, runs inline with low overhead, and layers cleanly with the other guardrails in the platform. Detailed field references are available in the GraySwan integration docs, and broader patterns are covered across the Bifrost resources hub.

To see how Bifrost can enforce natural-language safety rules across your AI workloads, book a demo with the Bifrost team.