MCP Gateway | High-Performance Tool Execution for AI Agents
Connect AI models to external tools with an open-source MCP gateway that delivers 11µs overhead and complete security control.
Performance at a Glance
- Internal Overhead
- 11µs Ultra-low latency at high throughput
- Token Savings
- 50%+ With Code Mode vs classic MCP
- Faster Execution
- 40% Code Mode execution pipeline
- Model Support
- 1000+ LLM models supported
Setup Steps
- 01Register MCP servers. Connect Bifrost to any MCP-compliant server. Bifrost auto-discovers available tools and their schemas at startup.
Register MCP servers # bifrost config mcp_servers: - name: filesystem transport: stdio command: npx @modelcontextprotocol/server-filesystem - 02Send a chat request. Your app sends a standard chat completion request. Bifrost injects discovered MCP tools into the request automatically.
Send a chat request curl http://localhost:8080/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{"model": "claude-sonnet", "messages": [...]}' - 03Execute tool calls. When the LLM suggests a tool call, your app decides whether to execute it. Bifrost handles the MCP protocol and returns results.
Execute tool calls # tool call returned in response # your app approves, then Bifrost executes # full audit trail logged automatically
Core Capabilities
- STDIO + HTTP + SSEConnect to MCP servers. Connect to any MCP-compliant server via STDIO, HTTP, or SSE. Bifrost auto-discovers tools and their schemas at runtime so your AI models can use them immediately.
- OAuth 2.0 with automatic token refreshOAuth Authentication. Secure OAuth 2.0 authentication with automatic token refresh
- Security-firstExplicit tool execution. Tool calls from LLMs are suggestions only. Execution requires a separate API call, giving your app full control to validate, filter, and approve every action before it runs.
- Configurable auto-approvalAgent Mode. Enable autonomous multi-step tool execution with configurable auto-approval. Specify exactly which tools can auto-execute while keeping human oversight for sensitive operations.
- Token efficiencyCode Mode. AI writes Python to orchestrate multiple tools. Four meta-tools replace 100+ definitions with on-demand schema loading and sandbox execution. Cuts tokens by 50%+ and LLM calls by 3-4x.
- Single gateway URLMCP Gateway URL. A single endpoint for tool discovery, execution, and management.
Security Principles
- Explicit execution. Tool calls from LLMs are suggestions only. Execution requires a separate API call from your application.
- Granular control. Filter tools per-request, per-client, or per-virtual-key. Blacklist dangerous tools globally.
- Opt-in auto-execution. Agent Mode with auto-execution must be explicitly configured. Specify exactly which tools are allowed.
- Stateless design. Each API call is independent. Your app controls conversation state with full audit trails at every step.
Comparison Data
| Feature | Classic | Bifrost |
|---|---|---|
| Tool definition overhead | 100+ tool schemas sent every request | AI writes code to call tools |
| Token usage | High (all tool schemas in context) | 50%+ reduction |
| Execution latency | Multiple round-trips per tool | 40% faster execution |
| Multi-tool orchestration | Sequential tool calls only | Python orchestrates in one pass |
| Scalability with servers | Degrades with 3+ servers | Scales to any number |
| Error handling | LLM retries each tool call | Python try/catch in sandbox |
Virtual Keys and Virtual MCP Servers
Production MCP needs governance at the gateway.
- Per-consumer scopingVirtual keys. Issue scoped credentials for every user, team, or customer integration. Each virtual key carries a tool-level allowlist, not just server access, so the model only receives definitions for granted tools.
- MCP Tool GroupsVirtual MCP server. MCP Tool Groups bundle selected tools from any connected server into one curated virtual MCP server for a team or customer. Define a group once, attach it to virtual keys, and Bifrost exposes only those tools at request time.
- Per-tool visibilityAudit and cost tracking. Every tool execution is logged with tool name, server, latency, virtual key, and parent LLM request. Per-tool pricing surfaces MCP tool spend alongside LLM token costs in one view.
Connection Types
- Local toolsSTDIO. Local process execution via stdin/stdout.
- MicroservicesHTTP. Remote MCP servers via HTTP requests.
- Live dataSSE. Persistent streaming for real-time data.
Use Cases
- Agentic coding pipelines. Connect AI coding agents to filesystem tools, databases, and deployment pipelines. Bifrost handles tool injection transparently with full audit trails for every operation.
- Regulated enterprise environments. Deploy in healthcare, finance, or government with explicit approval workflows, PII redaction, and tamper-evident audit logs for SOC 2 and HIPAA compliance.
- Multi-tool orchestration. Coordinate filesystem operations, database queries, and API calls in a single request using Code Mode. Reduce token waste and latency when using 3+ MCP servers.
- DevOps & infrastructure automation. Supervised infrastructure actions and deployments with role-based tool access. Only approved tools execute, with complete visibility into every automated step.
- Centralized tool governance. Manage tool access across teams with virtual keys and per-key tool filtering. Set different tool policies for development, staging, and production environments.
- Claude Desktop & MCP clients. Expose your entire tool ecosystem through a single Bifrost gateway URL. Claude Desktop and other MCP clients connect once and discover all available tools automatically.
Architecture Features
- External MCP server connections. Bifrost connects to external MCP servers such as filesystem tools, web search, databases, and custom APIs. It discovers their capabilities automatically.
- Single gateway endpoint. Bifrost exposes all connected tools through a single gateway URL. MCP clients like Claude Desktop connect to Bifrost and access everything.
The Fastest Open-Source MCP Gateway
- 11µs overhead at 5,000 requests per second.
- Stateless architecture with explicit approval.
- Code Mode: 50% fewer tokens, 40% faster execution.
- Dual role: MCP Client and MCP Server.
- Built-in OAuth 2.0 with automatic token refresh.
- Production-proven at millions of requests/day.
- Complete audit trails and OpenTelemetry export.
- Open source (Apache 2.0) with enterprise support.
- Go-native with zero Python GIL bottleneck.
Read the MCP Gateway Deep Dive
Virtual keys, MCP Tool Groups, Code Mode benchmarks, and how production teams govern tool access while cutting context cost at scale. [Read Full Article: Bifrost MCP Gateway deep dive]
Open Source & Enterprise
OSS Features
- 01Model Catalog. Access 8+ providers and 1000+ AI models through a unified interface. Also supports custom deployed models.
- 02Budgeting. Set spending limits and track costs across teams, projects, and models.
- 03Provider Fallback. Automatic failover between providers ensures 99.99% uptime for your applications.
- 04MCP Gateway. Centralize all MCP tool connections, governance, security, and auth. Your AI can safely use MCP tools with centralized policy enforcement. [MCP Gateway resource]
- 05Virtual Key Management. Create different virtual keys for different use cases with independent budgets and access control.
- 06Unified Interface. One consistent API for all providers. Switch models without changing code.
- 07Drop-in Replacement. Replace your existing SDK with just one line change. Compatible with OpenAI, Anthropic, LiteLLM, Google GenAI, LangChain, and more. [Drop-in replacement docs]
- 08Built-in Observability. Out-of-the-box OpenTelemetry support. Built-in dashboard for quick visibility without complex setup.
- 09Community Support. Active Discord community with responsive support and regular updates.
Enterprise Features
- 01Governance. SAML support for SSO and role-based access control with policy enforcement for team collaboration. [Governance resource]
- 02Adaptive Load Balancing. Automatically optimizes traffic distribution across provider keys and models based on real-time performance metrics.
- 03Cluster Mode. High availability deployment with automatic failover and load balancing. Peer-to-peer clustering where every instance is equal.
- 04Alerts. Real-time notifications for budget limits, failures, and performance issues on Email, Slack, PagerDuty, Teams, Webhook, and more.
- 05Log Exports. Export and analyze request logs, traces, and telemetry data from Bifrost with enterprise-grade data export for compliance, monitoring, and analytics.
- 06Audit Logs. Comprehensive logging and audit trails for compliance and debugging.
- 07Vault Support. Secure API key management with HashiCorp Vault, AWS Secrets Manager, Google Secret Manager, and Azure Key Vault integration.
- 08VPC Deployment. Deploy Bifrost within your private cloud infrastructure with VPC isolation, custom networking, and enhanced security controls. [Enterprise deployment resource]
- 09Guardrails. Automatically detect and block unsafe model outputs with real-time policy enforcement and content moderation across all agents. [Guardrails resource]
FAQ
What is an MCP gateway and why do I need one?
An MCP (Model Context Protocol) gateway connects AI models to external tools like filesystems, databases, and APIs. Without a gateway, each AI client needs individual tool configurations. Bifrost centralizes tool management, adds security controls, and provides audit trails for every tool execution.
Does Bifrost automatically execute tool calls from AI models?
No. By default, Bifrost treats tool calls from LLMs as suggestions only. Your application must explicitly approve and trigger execution via a separate API call. This security-first design prevents unintended actions. Agent Mode with auto-execution is available but requires explicit opt-in configuration.
How do I manage rate limits and resource usage across MCP servers?
An MCP gateway solves the problem of runaway tool-calling that can overload internal systems or hit provider API limits. The main objective is to regulate resource consumption while maintaining a smooth developer experience. Key features for resource management include: • Token & Request Budgeting: Set hard limits on tool calls per team to prevent backend system overloads and control costs. • Automated Failover: Reroute traffic to secondary servers or models if an MCP connection times out or fails. • Scale-Ready Architecture: Built to handle thousands of tool-calls concurrently without degrading performance or reliability.
What is Code Mode and how does it reduce costs?
Code Mode replaces traditional tool calling with AI-generated Python code that orchestrates multiple tools in a single round-trip. Instead of sending 100+ tool schemas in every request, Code Mode uses four meta-tools for on-demand schema loading. This cuts token usage by 50%+ and reduces LLM calls by 3-4x.
What MCP transport protocols does Bifrost support?
Bifrost supports all three MCP transport types: STDIO for local process execution, HTTP for remote MCP servers, and SSE (Server-Sent Events) for real-time streaming connections. OAuth 2.0 authentication with automatic token refresh is built in.
Can I use Bifrost as an MCP server for Claude Desktop?
Yes. Bifrost acts as both an MCP client (connecting to external tool servers) and an MCP server (exposing tools to clients). Claude Desktop and other MCP-compatible clients can connect to a single Bifrost gateway URL to discover and use all registered tools.
What are virtual keys in Bifrost MCP Gateway?
Virtual keys are scoped credentials for each consumer of your MCP gateway: a user, team, or customer integration. Each key defines which tools it may call at the tool level, not just per server, so customer-facing agents cannot reach internal admin tooling.
What is a Virtual MCP server?
A Virtual MCP server is a curated toolkit your agents see at the gateway, built from MCP Tool Groups. Each group is a named collection of tools from one or more backend MCP servers. Define a group once, attach it to virtual keys, teams, or customers, and Bifrost resolves allowed tools in memory at request time without duplicates.