AI Gateway

Agent Mode: Autonomous MCP Tool Execution with Bifrost

Bifrost Agent Mode enables autonomous tool execution in an MCP gateway, running approved tool calls automatically with configurable, per-tool auto-approval.

By default, an MCP gateway returns an LLM's tool calls to the application, which decides what runs and when. Agent Mode changes that by enabling autonomous tool execution: the gateway runs approved tool calls automatically, feeds the results back to the model, and loops until the task completes. Bifrost, the open-source MCP gateway built in Go by Maxim AI, ships Agent Mode as a built-in capability for teams building autonomous AI agents on production infrastructure. This post explains how Agent Mode works, how to configure auto-execution safely, and where it fits in the broader MCP gateway.

What Is Agent Mode in an MCP Gateway?

Agent Mode is a configuration in an MCP gateway that executes an LLM's tool calls autonomously instead of returning them to the application for manual handling. When enabled, Bifrost runs each auto-executable tool, feeds the result back to the model, and repeats the cycle until the model stops requesting tools or a configured depth limit is reached. This turns the gateway from a request router into an autonomous agent runtime.

The contrast with default behavior is the important part. Without Agent Mode, every tool call from the model is returned to your application, which explicitly invokes tool execution on the gateway. That stateless, explicit-execution pattern keeps a human or an application in the loop for every operation. Agent Mode collapses that round trip into the gateway itself, which is what makes multi-step agent tasks practical without writing a custom orchestration layer.

Why Autonomous Tool Execution Matters for AI Agents

Multi-step agent tasks require many tool calls in sequence: list files, read several of them, search a database, then summarize. Handled manually, each step is a separate round trip that the application has to orchestrate, parse, and resubmit. Autonomous tool execution removes that orchestration burden by running the loop inside the MCP gateway, which reduces application code and the latency of shuttling intermediate results back and forth.

This matters more as agents connect to more tools. Anthropic, which created the Model Context Protocol, notes that developers now routinely build agents with access to hundreds or thousands of tools across dozens of MCP servers. At that scale, manual per-call orchestration becomes the bottleneck, and consistent tool filtering across every connected server is what keeps autonomous execution governable.

How Agent Mode Works in Bifrost

When Agent Mode is enabled, the Bifrost gateway runs a deterministic loop for each request:

The LLM returns tool calls in its response.
Bifrost automatically executes the tools that are marked auto-executable.
Results are fed back to the LLM as context.
The loop continues until the model returns no more tool calls, or the max depth is reached.
Any tool that is not auto-executable is returned to your application for approval.

A few behaviors are worth knowing before you turn it on:

Max depth. The max_agent_depth setting limits how many iterations the agent loop can run. The default is 10, and it is configurable from 1 to 50. Each LLM call that produces tool calls counts as one iteration. When the limit is reached, the current response is returned as-is and may still contain pending tool calls.
Parallel execution. When a single response contains multiple auto-executable tools, Bifrost runs them in parallel and collects the results before the next LLM call, which keeps per-iteration latency low.
Tool execution timeout. Each tool execution is bounded by tool_execution_timeout, which defaults to 30 seconds. If a tool exceeds the timeout, an error result is returned and the loop continues with that error.
Streaming is not supported. The autonomous loop requires complete responses before proceeding to the next iteration, so Agent Mode runs on the non-streaming chat and responses endpoints, not their streaming variants.

When a response mixes auto-executable and non-auto-executable tools, Bifrost executes the auto tools first, then returns a response with the executed results summarized in the content field, the pending tools in tool_calls, and finish_reason set to stop. Your application reviews the pending tools, runs or rejects them, and continues the conversation.

Configuring Auto-Execution: Two Lists That Control Everything

Agent Mode is driven by two configuration fields, and the distinction between them is the core of the security model:

Field	Purpose	Semantics
`tools_to_execute`	Which tools are available to the LLM (whitelist)	`["*"]` = all, `[]` = none, `["a","b"]` = specific tools
`tools_to_auto_execute`	Which available tools run without approval	Same semantics, must be a subset of `tools_to_execute`

The auto-execute list must be a subset of the execute list. A tool listed for auto-execution that is not in tools_to_execute is ignored, because the execute whitelist always takes precedence. By default, no tools are auto-executed, so Agent Mode requires explicit opt-in for every tool you want to run automatically.

A typical configuration makes all tools available but only auto-executes safe read operations:

{
  "name": "filesystem",
  "connection_type": "stdio",
  "stdio_config": {
    "command": "npx",
    "args": ["-y", "@anthropic/mcp-filesystem"]
  },
  "tools_to_execute": ["*"],
  "tools_to_auto_execute": ["read_file", "list_directory"]
}

You can manage these settings per client in the web UI by toggling an "Automatically execute tool" switch for each available tool, through the gateway API, or in config.json. Auto-execute configuration is per client, so different MCP servers can carry different policies. Combine this with MCP tool filtering per virtual key to vary which tools are reachable per team, environment, or user.

Choosing Which Tools Auto-Execute Safely

The auto-execute list is a security boundary, not a convenience setting. Operations with side effects should require human approval rather than running unattended. A practical split looks like this:

Safe to auto-execute:

Read operations such as read_file and list_directory
Search and query operations such as search and fetch_url
Non-destructive information gathering with no side effects

Should require approval:

Write operations such as write_file and create_file
Delete operations such as delete_file and delete_record
Execute operations such as run_command and execute_script
Actions with external side effects such as sending email or making purchases

Keeping destructive and side-effecting tools off the auto-execute list means pending operations are returned as approvals instead of running, which preserves a human decision point for the operations that carry real risk. For enterprise deployments in regulated industries, pairing this with immutable audit logs gives a tamper-evident record of every tool suggestion, approval, and execution for SOC 2, GDPR, and HIPAA evidence.

Agent Mode, Governance, and Observability

Autonomous execution is only safe inside a governed boundary. Bifrost applies the same controls to auto-executed tools that it applies to every request, so Agent Mode inherits the full governance model rather than bypassing it:

Virtual keys scope access, budgets, and rate limits per consumer. Virtual keys are the primary governance entity, and tool availability can be filtered per key.
Tool filtering layers client-level, request-level, and per-key filters, so the set of tools eligible for auto-execution is constrained before the loop ever runs.
Observability through native Prometheus metrics and OpenTelemetry tracing captures each iteration of the agent loop, making intermediate tool calls visible instead of hidden inside a black box.

This combination is what makes Agent Mode usable for coding agents pointed at the gateway. A team can route Claude Code and similar agents through Bifrost, auto-execute read-only filesystem and search tools for speed, and hold writes and deploys for explicit approval, all while keeping a single audit trail. The same Bifrost MCP gateway can also expose its aggregated tools to external clients, so one deployment handles tool discovery, governance, execution, and exposure.

Common Questions About Agent Mode

Does Bifrost auto-execute tool calls by default?

No. By default, Bifrost returns tool calls to the application and executes nothing automatically. Tools run autonomously only after you explicitly add them to tools_to_auto_execute.

Does Agent Mode work with streaming responses?

No. The autonomous loop needs complete responses to decide whether to continue, so Agent Mode runs on the non-streaming chat and responses endpoints. Use those endpoints when Agent Mode is enabled.

How many tool-calling iterations can Agent Mode run?

The max_agent_depth setting controls this. It defaults to 10 iterations and is configurable from 1 to 50. When the limit is reached, the current response is returned even if it still contains pending tool calls.

What happens to tools that need approval?

Bifrost executes the auto-executable tools, then returns the pending tools in the tool_calls array with finish_reason set to stop. Your application reviews them, runs or rejects each one, and continues the conversation with the results.

Getting Started with Agent Mode in Bifrost

Agent Mode brings autonomous tool execution into the gateway layer, so AI agents can run multi-step tool workflows without a custom orchestration loop, while configurable auto-approval, max depth, and per-tool security keep dangerous operations behind human review. Benchmarked at 11 microseconds of overhead at 5,000 RPS, Bifrost is built for enterprises running mission-critical AI workloads that need autonomous execution without giving up governance or auditability.

To see how Agent Mode and the broader MCP gateway fit your stack, book a demo with the Bifrost team, or explore the open-source project on GitHub and the Bifrost documentation to start configuring auto-execution today.