AI Gateway

Building Production-Ready AI Agents with Bifrost's MCP Gateway

The evolution from static AI chatbots to autonomous agents capable of executing real-world tasks represents one of the most significant shifts in enterprise AI adoption. However, scaling these agents in production environments introduces critical challenges around security, cost management, and operational control. Bifrost's comprehensive implementation of the Model Context Protocol (MCP) addresses these challenges head-on, providing enterprise teams with the infrastructure needed to deploy reliable, secure, and cost-effective AI agents at scale.

Understanding the Model Context Protocol Challenge

Model Context Protocol (MCP) is an open standard that enables AI models to discover and execute external tools at runtime. Instead of being limited to text generation, AI models can interact with filesystems, search the web, query databases, and execute custom business logic through external MCP servers. However, implementing MCP at enterprise scale introduces significant operational challenges.

The fundamental problem emerges when connecting multiple MCP servers. Each server typically exposes between 10 and 30 tools, and as organizations connect more servers to handle different capabilities (filesystem operations, web search, customer databases, internal APIs) the tool catalog grows exponentially. With 8 to 10 MCP servers, AI applications can face 150 or more tool definitions that must be included in every single request to the language model. This creates substantial token overhead, increased latency, and significantly higher operational costs.

According to Bifrost's MCP documentation, organizations using traditional MCP implementations with 5 servers and approximately 100 tools can experience 6 or more language model turns, with each turn including all 100 tool definitions, resulting in over 600 tokens spent on tool catalogs alone per workflow.

Bifrost's Security-First Architecture

Unlike many MCP implementations that default to automatic tool execution, Bifrost takes a security-first approach where no tool calls are automatically executed by default. When a language model returns tool calls in its response, these are suggestions only, actual execution requires explicit API calls from the application layer. This architectural decision ensures human oversight for potentially dangerous operations and provides complete audit trails for all tool executions.

The security model follows four core principles. First, explicit execution means tool calls from language models are never automatically executed without separate API authorization. Second, granular control allows tools to be filtered per-request, per-client, or per-virtual-key, implementing precise permission boundaries. Third, opt-in auto-execution through Agent Mode must be explicitly configured for specific tools, never enabled by default. Finally, stateless design ensures each API call remains independent, with application code maintaining full control over conversation state and tool execution authorization.

This security architecture is particularly critical for enterprise deployments where AI agents may interact with production databases, customer data, or financial systems. The separation between tool call suggestion and tool call execution creates a clear authorization boundary that integrates naturally with existing enterprise security policies and compliance requirements.

Connecting to External MCP Servers

Bifrost supports three distinct connection protocols for MCP servers, each optimized for different deployment patterns and use cases. STDIO connections launch external processes and communicate via standard input and output, making them ideal for local tools, CLI utilities, and scripts. Organizations can run filesystem tools, Python-based MCP servers, or database clients with local credentials through STDIO connections.

HTTP connections communicate with MCP servers via standard HTTP requests, designed for remote APIs, microservices, and cloud-hosted services. This protocol enables integration with third-party tool providers, cloud functions, and distributed systems where tools run as independent services. SSE (Server-Sent Events) connections provide real-time, persistent connections to MCP servers, supporting use cases like live market data, system monitoring, and event-driven workflows that require streaming updates.

For each connection type, Bifrost provides comprehensive configuration options through web UI, REST API, or configuration files. The platform includes automatic health monitoring, sending periodic pings to connected clients every 10 seconds with configurable timeout and failure thresholds. When a client becomes disconnected after consecutive failed health checks, its tools become unavailable until manual reconnection, preventing incomplete or unreliable tool executions.

The naming conventions for MCP clients enforce consistency across deployments: names must contain only ASCII characters, cannot include hyphens or spaces, cannot start with numbers, and must be unique across all connected clients. These requirements ensure compatibility across different deployment environments and eliminate ambiguity in tool execution logs and audit trails.

Tool Execution Architecture and Workflow

The tool execution workflow in Bifrost follows a stateless pattern that gives applications complete control over the tool execution lifecycle. The workflow begins with sending a chat completion request to the language model through Bifrost's unified API. When the model determines that tools are needed, it returns tool call suggestions in the response with finish_reason set to tool_calls.

At this point, the application reviews suggested tool calls, applying security rules, checking user permissions, and potentially requesting user approval for sensitive operations. Only after this review does the application explicitly call Bifrost's tool execution endpoint with the approved tool calls. Bifrost executes the tool against the connected MCP server and returns results in a format that can be directly appended to the conversation history.

The application then continues the conversation by sending a new chat completion request that includes the original user message, the assistant's tool call message, and the tool execution results. This pattern repeats until the model completes its task without requesting additional tool calls, at which point it returns a final text response with finish_reason set to stop.

This stateless architecture provides several critical advantages for production deployments. First, each API call remains independent with no server-side conversation state, making the system naturally scalable and eliminating session affinity requirements. Second, applications maintain full audit trails of all tool operations, with explicit records of what was requested, what was approved, and what was executed. Third, the separation between suggestion and execution enables sophisticated approval workflows, including multi-level authorization, risk scoring, and compliance checks before any tool executes.

For applications handling multiple concurrent tool calls, Bifrost supports both sequential and parallel execution patterns. Tool results are returned in the standardized OpenAI format, ensuring compatibility with existing AI application frameworks and eliminating the need for custom response parsing logic.

Code Mode: Achieving 50 Percent Cost Reduction at Scale

The most transformative feature in Bifrost's MCP implementation is Code Mode, which fundamentally reimagines how AI models interact with large tool catalogs. Instead of exposing 150 tools directly to the language model, Code Mode provides just three meta-tools: listToolFiles for discovering available MCP servers, readToolFile for loading TypeScript definitions on-demand, and executeToolCode for executing TypeScript code with full tool bindings in a sandboxed environment.

When enabled for MCP clients, Code Mode transforms the interaction pattern. The language model uses listToolFiles to see what tool servers are available, then calls readToolFile to load only the specific tool definitions it needs for the current task. Finally, it writes TypeScript code that orchestrates multiple tool calls and executes this code through executeToolCode, which runs in an isolated VM with access to all configured MCP servers.

The performance impact is substantial. According to Bifrost's benchmarking data, comparing a workflow across 5 MCP servers with approximately 100 tools, the classic MCP approach requires 6 language model turns with 100 tool definitions included in every turn, consuming over 600 tokens just for tool catalogs. In contrast, Code Mode reduces this to 3 to 4 language model turns with only 3 tool definitions plus on-demand loading, resulting in approximately 50 tokens spent on tool definitions. The result is roughly 50 percent cost reduction combined with 30 to 40 percent faster execution.

The cost savings compound in complex workflows. When Code Mode is enabled, all intermediate orchestration happens inside the sandboxed execution environment rather than requiring round trips through the language model. A task that traditionally required separate API calls to search, retrieve, process, and format data can complete in a single code execution, with only the final result returned to the language model.

Code Mode can be enabled per MCP client, allowing organizations to mix approaches based on their needs. Heavy servers like web search, document processing, and database access benefit most from Code Mode, while small utility servers with just a few simple tools can remain as direct tool calls. The platform supports both server-level binding, where all tools from a server are grouped into a single definition file, and tool-level binding, where each tool gets its own definition file for more focused documentation.

For production deployments, Code Mode includes comprehensive security controls. The TypeScript execution environment runs in an isolated Goja VM with ES5-compatible JavaScript, providing async/await support, promises, and console logging while blocking access to network APIs, filesystem operations, and Node.js modules. Tool calls within executed code are validated against the same tools_to_execute and tools_to_auto_execute configurations used for direct tool execution, ensuring consistent security policies.

Agent Mode for Autonomous Workflows

While Bifrost's default security-first approach requires explicit approval for every tool execution, many production workflows benefit from supervised autonomy for specific operations. Agent Mode enables this by allowing automatic execution of pre-approved tools while maintaining manual approval requirements for sensitive operations.

Agent Mode operates through the tools_to_auto_execute configuration field on each MCP client. Tools listed in this field can execute automatically without manual approval, while tools not listed require explicit authorization through the standard execution workflow. The configuration follows the same semantics as tools_to_execute, supporting wildcards for all tools, empty arrays for none, or specific tool names for selective auto-execution.

When Agent Mode is active and the language model returns tool calls, Bifrost automatically identifies which calls are auto-executable based on configuration. Auto-executable tools execute in parallel for optimal performance, with results collected and fed back to the language model for the next iteration. This loop continues until either the model completes its task without additional tool calls or the configured maximum depth is reached.

For responses containing both auto-executable and non-auto-executable tools, Bifrost executes the auto-executable tools first, then returns a response containing a JSON summary of executed tools in the content field and pending non-auto-executable tool calls in the tool_calls array. The application can then review pending tool calls, execute or reject them manually, and continue the conversation with the results.

The maximum agent depth setting limits autonomous iterations to prevent runaway execution, with a default of 10 iterations and configurable range from 1 to 50. Each language model call that produces tool calls counts as one iteration, and when maximum depth is reached, the current response returns as-is, potentially including pending tool calls that require manual approval.

Security best practices for Agent Mode recommend marking only read operations and non-destructive information gathering as auto-executable. Operations like reading files, listing directories, and search queries are generally safe for automatic execution. In contrast, write operations, delete operations, command execution, and operations with external side effects should require human approval. This separation ensures that agents can gather information and analyze data autonomously while requiring explicit authorization for actions that modify state or interact with external systems.

Granular Control Through Tool Filtering

Enterprise AI deployments require fine-grained control over which tools are available to different users, applications, and contexts. Bifrost's tool filtering system operates at three distinct levels, each serving different organizational requirements and security models.

Client configuration filtering establishes the baseline of available tools through the tools_to_execute field on each MCP client. This defines the superset of tools that can potentially be used from that client, with semantics supporting wildcards for all tools, empty arrays for none, or explicit lists of specific tool names. No tool can be executed unless it appears in this baseline configuration.

Request-level filtering enables dynamic tool control on a per-request basis using HTTP headers or SDK context values. Applications can specify which MCP clients to include using the mcp-include-clients header, limiting tools to only specified servers. Alternatively, the mcp-include-tools header allows precise control over individual tools using the format clientName/toolName, with wildcard support for including all tools from specific clients.

Virtual key filtering provides the most sophisticated control mechanism, especially critical for multi-tenant deployments or organizations with complex permission hierarchies. When virtual keys have MCP configurations defined, these configurations take precedence over request-level headers. Each virtual key can specify exactly which tools from which MCP clients are available to requests authenticated with that key, implementing role-based access control at the tool execution layer.

The filtering logic combines these levels through intersection semantics: available tools equal the intersection of client configuration tools, request filters, and virtual key filters. For production deployments, this enables patterns like read-only access by configuring only read operations as executable, environment-based filtering by using different virtual keys for development and production, and per-user tool access by creating virtual keys for different user roles or permission levels.

Exposing Tools via MCP Gateway URL

Beyond acting as an MCP client that connects to external servers, Bifrost can function as an MCP server, exposing all connected tools to external MCP clients through a single unified endpoint. This capability is particularly valuable for organizations using tools like Claude Desktop, Cursor, or custom MCP-compatible applications.

The gateway exposes two endpoints: a POST endpoint at /mcp for JSON-RPC 2.0 messages handling tool discovery and execution, and a GET endpoint at /mcp for Server-Sent Events enabling persistent connections for real-time communication. External clients can connect to these endpoints to discover available tools, retrieve tool definitions, and execute tools through Bifrost's security and governance infrastructure.

When virtual key authentication is enabled through the enforce_governance_header configuration, each virtual key receives its own MCP server instance with tools filtered according to that key's configuration. This enables secure multi-tenant deployments where different external clients see only the tools appropriate for their authorization level. Virtual keys can be passed via Authorization header with Bearer prefix, X-Api-Key header, or x-bf-virtual-key header, providing flexibility for different client authentication patterns.

The gateway includes automatic health monitoring of connected MCP clients, sending pings every 10 seconds by default with configurable timeouts and failure thresholds. Tools from disconnected clients become unavailable through the gateway, and administrators can trigger manual reconnection attempts via API or SDK when connection issues are resolved. Request ID tracking enables detailed audit trails for agent mode operations, correlating tool executions to specific iterations and enabling comprehensive observability for autonomous workflows.

For production deployments, security considerations include enabling virtual key enforcement to require authentication for all MCP requests, deploying Bifrost behind a reverse proxy with TLS encryption, limiting tool access through virtual key configurations following least-privilege principles, and implementing network-level restrictions to control which IP addresses can access the MCP endpoint.

Production Deployment Considerations

Deploying Bifrost's MCP implementation in production environments requires attention to several operational considerations. The health monitoring system continuously tracks the state of connected MCP clients, automatically detecting disconnections through periodic health checks. When clients become disconnected, their tools become unavailable until manual reconnection, preventing partial or unreliable executions.

For organizations with multiple MCP servers, the decision between enabling Code Mode and using classic tool exposure depends on the number of tools and workflow complexity. The documentation recommends Code Mode when connecting 3 or more MCP servers, handling complex multi-step workflows, or when token costs and latency are primary concerns. Organizations can mix both approaches, enabling Code Mode for heavy servers while keeping small utility servers as direct tools.

Tool execution timeouts provide protection against long-running or stuck operations, with a default of 30 seconds per tool execution and configurable settings through tool_execution_timeout in tool manager configuration. When tools exceed timeout thresholds, error results return to the application, enabling graceful handling and preventing workflow stalls.

The binding level configuration for Code Mode controls how tools are organized in the virtual filesystem, with server-level binding grouping all tools from a server into a single definition file and tool-level binding creating individual definition files for each tool. Server-level binding suits servers with fewer tools and simpler discovery workflows, while tool-level binding optimizes for servers with 30 or more tools where minimizing context bloat is critical.

For audit and compliance requirements, Bifrost's stateless architecture ensures complete operation logs with clear correlation between tool call suggestions, approvals, executions, and results. Applications can implement custom request ID management through the FetchNewRequestIDFunc configuration option, enabling detailed tracking of agent iterations and tool executions for regulatory compliance and operational debugging.

Enabling Reliable AI Agents at Scale

Bifrost's comprehensive MCP implementation transforms the operational reality of building and deploying production AI agents. The security-first architecture ensures human oversight for sensitive operations while enabling supervised autonomy for routine tasks. Code Mode's dramatic cost and latency reductions make complex multi-tool workflows economically viable. Agent Mode provides the flexibility to balance automation and control based on organizational risk tolerance and use case requirements.

For organizations building AI agents that interact with internal systems, customer data, or external APIs, these capabilities provide the foundation for reliable, secure, and cost-effective deployments. The combination of granular tool filtering, comprehensive health monitoring, and flexible configuration options supports the diverse requirements of enterprise AI applications while maintaining the simplicity of a unified API interface.

Teams looking to implement production-ready AI agents with MCP can explore Bifrost's capabilities through the comprehensive documentation or get started with a free account on the Maxim AI platform. For enterprise deployments requiring custom integrations, dedicated support, or on-premises installations, schedule a demo to discuss your specific requirements with the Maxim team.

Building Production-Ready AI Agents with Bifrost's MCP Gateway

Understanding the Model Context Protocol Challenge

Bifrost's Security-First Architecture

Connecting to External MCP Servers

Tool Execution Architecture and Workflow

Code Mode: Achieving 50 Percent Cost Reduction at Scale

Agent Mode for Autonomous Workflows

Granular Control Through Tool Filtering

Exposing Tools via MCP Gateway URL

Production Deployment Considerations

Enabling Reliable AI Agents at Scale

Read next

Tracking LLM Token Usage Across Providers, Teams, and Workloads

Top Enterprise AI Gateways for LLM Observability in 2026

Using an MCP Gateway with Claude Code: How Bifrost Centralizes Tool Access for Agentic Coding

Ship your AI agents 5x faster ⚡️