[ PERFORMANCE AT A GLANCE ]
[ THE PROBLEM ]
Most AI gateways are SaaS-only or lack enterprise deployment flexibility. Scaling AI across regulated, multi-cloud, or restricted environments surfaces problems that managed proxies don't solve.
SaaS gateways route every prompt, completion, and API key through a third-party network. Sensitive data leaves your perimeter, and compliance teams block adoption before it starts.
Classified and regulated environments require zero external network access. SaaS gateways cannot operate offline, and most open-source proxies still phone home for updates or telemetry.
When teams deploy a separate gateway in each cloud, policies drift between environments, configuration is duplicated manually, and rate limits or budgets cannot be enforced across providers from a single control plane.
Single-instance gateways are single points of failure. Without native clustering, automatic failover, or zero-downtime rolling updates, any outage takes your entire AI stack offline.
[ DEPLOYMENT MODELS ]
Match your infrastructure, compliance, and security requirements. Bifrost deploys the same way regardless of where it runs.
Deploy entirely within your VPC on AWS, GCP, or Azure. Complete network isolation with native IAM integration, private endpoints, and no external dependencies.
Run on your own hardware via Kubernetes, Docker Compose, or a single binary on bare metal VMs. Full control over compute, storage, and networking with no cloud dependency.
For environments with zero internet access. Export the Bifrost image on a connected machine, transfer via tarball, load into your internal registry. No phone-home, no telemetry.
Run a single Bifrost instance for dev/test, branch offices, or edge deployments. Minimal footprint with instant setup via Docker Compose, fly.io, or a single Go binary.
[ CLOUD SUPPORT ]
Cloud-native authentication and registry distribution per platform. No fighting your cloud provider's security model.
| Cloud | Targets | Auth Method | Registry | IaC |
|---|---|---|---|---|
| AWS | EKS, ECS | IRSA | Artifact Registry | Terraform, Helm |
| GCP | GKE, Cloud Run | Workload Identity | Artifact Registry | Terraform, Helm |
| Azure | AKS | Azure WIF | Artifact Registry | Terraform, Helm |
| On-Premise | K8s, Docker, Bare Metal | Basic Auth | Internal mirror | Helm, Compose, Single binary |
module "bifrost" {
source = "github.com/maximhq/bifrost//terraform/modules/bifrost?ref=terraform/v0.1.0"
cloud_provider = "aws" # "aws" | "gcp" | "azure" | "kubernetes"
service = "eks" # AWS: "ecs" | "eks", GCP: "gke" | "cloud-run", Azure: "aks"
region = "us-east-1"
image_tag = "latest"
}[ CORE CAPABILITIES ]
Every deployment model ships with the same HA clustering, security controls, and observability stack.
Zero-downtime at scale
Zero trust architecture, audit ready
Full stack monitoring
[ HOW IT WORKS ]
No custom agents, no proprietary orchestration. Deploy with your existing infrastructure tools.
VPC, on-prem, air-gapped, or multi-cloud. Pick the model that matches your infrastructure and compliance requirements.
# In-VPC, On-Premise, Air-Gapped, Multi-Cloud, or EdgeUse the Terraform module or Helm chart. AWS, GCP, Azure, and generic Kubernetes all supported out of the box.
terraform applySingle JSON config file. Connect config store (Postgres or file), set up providers, define virtual keys and routing rules.
# config.json
# providers, virtual keys, routingSingle Go binary with minimal dependencies. Helm install, docker compose up, or terraform apply.
helm install bifrost maximhq/bifrostChange one line in your existing OpenAI, Anthropic, or LiteLLM SDK. Point at your Bifrost endpoint.
base_url = "https://bifrost.internal"Enable clustering for HA, connect Prometheus or OpenTelemetry, set up auto-scaling. Production ready.
curl http://bifrost.internal/cluster/status[ COMPARISON ]
| Capability | SaaS Proxy | OSS (LiteLLM etc.) | Bifrost Enterprise |
|---|---|---|---|
| In-VPC deployment | No | Manual setup | Terraform + Helm |
| Air-gapped support | No | No | Docker save/load |
| Cloud-native auth | API key only | API key only | IRSA, Workload Identity, Azure WIF |
| Vault integration | No | No | HashiCorp, AWS SM, GCP SM, Azure KV |
| RBAC + SSO | Limited | No | Okta, Entra ID, OIDC |
| Audit logs (SOC 2 Type II, HIPAA) | Limited | No | Immutable, compliance ready |
| Data stays in your network | No | Yes | Yes, zero egress |
| P99 latency at 500 RPS | ~50ms | ~90.72s | ~1.68s |
| Uptime SLA | Varies | None | 99.999% |
[ USE CASES ]
Compliance requires all AI traffic to stay within your VPC. No data can transit third-party proxies.
In-VPC deployment with complete network isolation, audit logs for HIPAA/SOC 2 Type II/GDPR, Vault for secrets, and zero data egress. All processing stays inside your controlled environment.
Your environment has no internet. You need a gateway that runs entirely offline with no phone-home.
Docker image export/import workflow, internal registry mirroring, no telemetry leakage. On-prem credential management with offline operation and manual update cycles.
Your org runs AWS for production, GCP for ML, Azure for business units. You need one gateway with consistent governance.
Clustered deployment across clouds with gossip-based state sync, cloud-native auth per environment (IRSA, Workload Identity, WIF), and unified rate limiting and budgets.
You're hitting performance ceilings, missing enterprise features, or struggling with reliability on your current gateway.
Drop-in LiteLLM compatibility, 54x faster P99 latency, 68% less memory. Plus enterprise features like RBAC, guardrails, clustering, and audit logs that alternatives don't offer.
[ GET STARTED ]
Bifrost is open source and production-ready. Teams deploy in hours and scale without rethinking the architecture.




[ BIFROST FEATURES ]
Everything you need to run AI in production, from free open source to enterprise-grade features.
01 Governance
SAML support for SSO and Role-based access control and policy enforcement for team collaboration.
02 Adaptive Load Balancing
Automatically optimizes traffic distribution across provider keys and models based on real-time performance metrics.
03 Cluster Mode
High availability deployment with automatic failover and load balancing. Peer-to-peer clustering where every instance is equal.
04 Alerts
Real-time notifications for budget limits, failures, and performance issues on Email, Slack, PagerDuty, Teams, Webhook and more.
05 Log Exports
Export and analyze request logs, traces, and telemetry data from Bifrost with enterprise-grade data export capabilities for compliance, monitoring, and analytics.
06 Audit Logs
Comprehensive logging and audit trails for compliance and debugging.
07 Vault Support
Secure API key management with HashiCorp Vault, AWS Secrets Manager, Google Secret Manager, and Azure Key Vault integration.
08 VPC Deployment
Deploy Bifrost within your private cloud infrastructure with VPC isolation, custom networking, and enhanced security controls.
09 Guardrails
Automatically detect and block unsafe model outputs with real-time policy enforcement and content moderation across all agents.
[ SHIP RELIABLE AI ]
Change just one line of code. Works with OpenAI, Anthropic, Vercel AI SDK, LangChain, and more.
[ FAQ ]
Yes. Bifrost deploys entirely within your VPC on AWS, GCP, or Azure with complete network isolation. All LLM requests, API keys, prompts, and completions stay within your network perimeter. Combined with vault support, no secrets leave your infrastructure.
Yes. Export the Bifrost Docker image on a connected machine using docker save, transfer the tarball to your air-gapped environment, and load it into your internal registry. No phone-home, no telemetry, fully offline operation.
Bifrost provides a single Terraform module that targets AWS (EKS/ECS), GCP (GKE/Cloud Run), Azure (AKS), and generic Kubernetes. Helm charts are also available for Kubernetes deployments. Docker Compose and single binary are provided for on-premise and bare metal.
A single Bifrost node requires 2 vCPU and 4GB RAM minimum. For production HA deployments, a 3-node cluster is recommended. The Go-native binary has a minimal footprint with 68% less memory usage than Python-based alternatives.