Try Bifrost Enterprise free for 14 days. Request access

Classic MCP vs Code Mode: Saving Token Costs at Scale

Classic MCP vs Code Mode: Saving Token Costs at Scale
Classic MCP loads every tool definition into context per request. Code Mode in Bifrost cuts input tokens by up to 92.8% across many connected MCP servers.

When an AI agent connects to 8 to 10 Model Context Protocol servers, every request can carry 150 or more tool definitions in the model's context, and most of the token budget is spent reading tool catalogs instead of completing the task. The question of classic MCP vs Code Mode is really a question of where that orchestration work happens: inside the model's context window, or inside a sandbox. Bifrost, the open-source AI gateway built in Go by Maxim AI, supports both patterns for running MCP tools, so teams can choose based on how many servers they connect and how complex their workflows are. This post explains how each approach works, where the token costs come from, and when to use one over the other.

What Classic MCP Tool Calling Looks Like

Classic MCP tool calling is the standard pattern where the gateway injects every connected tool definition into the model's context on each request, the model returns tool-call suggestions, and the application executes those calls explicitly. It is the default in Bifrost and in most MCP clients.

In Bifrost, the default tool execution flow is stateless and explicit. A chat completion request returns tool-call suggestions rather than running them, the application reviews and approves the calls, and a separate execute call runs the approved tools before the conversation continues. Bifrost acts as both an MCP client and an MCP server in this model, connecting to external servers over STDIO, HTTP, or SSE and exposing their tools to clients like Claude Desktop. The full pattern is documented in the MCP overview.

This design gives teams precise control. No tool runs without approval, every execution is auditable, and the application owns conversation state. The cost shows up at scale: each model turn reloads the complete tool catalog, and every intermediate result flows back through the context window before the next step.

What Code Mode Is

Code Mode is an execution model where the AI writes a short Python script to orchestrate tools inside a sandbox, instead of receiving every tool definition directly in its context. The model sees four generic meta-tools rather than 150 individual schemas.

Code Mode in Bifrost exposes these four meta-tools on every request:

  • listToolFiles: discover which MCP servers and tools are available
  • readToolFile: load compact Python function signatures for a server on demand
  • getToolDocs: fetch detailed documentation for a specific tool when needed
  • executeToolCode: run a Python script with full tool bindings in the sandbox

The model lists the available stub files, reads only the signatures it needs, and writes a Python script that the Bifrost AI gateway runs in a Starlark interpreter. Intermediate tool results stay inside the sandbox, and the model receives only the compact final output. This is the same insight published by Anthropic's engineering team, which reported a drop from roughly 150,000 tokens to 2,000 on a Google Drive to Salesforce workflow when tool calls were replaced with code execution. Bifrost builds the pattern natively into the gateway, with two deliberate choices: Python rather than JavaScript, because models are trained on more Python, and a dedicated documentation meta-tool to reduce context further.

Classic MCP vs Code Mode: A Side-by-Side Comparison

The two approaches differ in where tool definitions live, how many round trips a workflow needs, and how intermediate data is handled. Both run through the same MCP gateway, so switching between them is a per-client configuration change, not a rewrite.

Dimension Classic MCP Code Mode
Tool definitions in context All tools, every turn Four meta-tools, schemas loaded on demand
LLM round trips per workflow One per tool step 3 to 4 total
Intermediate results Routed through the model Kept inside the sandbox
Orchestration logic In the conversation loop In a Python script
Input token growth Scales with tool count Bounded by files actually read
Best fit 1 to 2 small servers, simple calls 3+ servers, multi-step workflows

Classic MCP keeps every tool visible to the model at all times, which is straightforward to reason about for small setups. Code Mode trades that visibility for efficiency: the catalog stays behind the four meta-tools, and the model pays only for the signatures and docs it reads.

How the Token Math Changes at Scale

Code Mode reduces input token usage by up to 92.8% and estimated cost by up to 92.2% in large MCP deployments, with savings that grow as tool count increases. The reason is structural: classic MCP cost rises with every connected tool because the catalog is reread on each turn, while Code Mode cost is bounded by the stub files the model actually opens.

Bifrost benchmarked Code Mode against classic MCP across three rounds with increasing MCP footprint, using the same query set each time. The benchmark results show the gap widening with scale:

  • 96 tools across 6 servers: input tokens fell 58.2%, estimated cost fell 55.7%
  • 251 tools across 11 servers: input tokens fell 84.5%, estimated cost fell 83.4%
  • 508 tools across 16 servers: input tokens fell 92.8% (75.1M to 5.4M), estimated cost fell from $377 to $29

Pass rate held at 100% in the largest round (65 of 65), so the token reduction did not trade away task success. Measured per query at around 500 tools, that is roughly a 14x reduction, from 1.15M tokens to 83K. The same runs produced 3 to 4 times fewer LLM round trips and around 40% faster execution. Teams can reproduce these numbers using the published performance benchmarks and run their own MCP footprints against them.

How Code Mode Works, Step by Step

Code Mode follows a discover-then-execute sequence rather than a call-and-return loop. Once Code Mode is enabled on a client, the tools for that client are no longer injected directly; they become reachable through the four meta-tools.

A typical workflow runs like this:

  1. Discover: the model calls listToolFiles to see which servers and tools exist
  2. Load: it calls readToolFile to read compact signatures for the relevant server
  3. Clarify: if a signature is not enough, getToolDocs returns full documentation for one tool
  4. Execute: the model writes a Python script and executeToolCode runs it in the sandbox, returning a compact result

The Starlark sandbox is deliberately constrained: no import statements, no file I/O, no network access, and a default 30-second execution timeout. Tools from Code Mode clients are exposed as global objects, and calls are synchronous, so a script can chain several tool calls and process their output before returning a single result. Bifrost supports two binding levels for how stubs are organized: server-level, where one file holds all of a server's tools, and tool-level, where each tool gets its own file for servers with large schemas.

Enabling Code Mode is a per-client setting. The same MCP server can run in classic mode for one client and Code Mode for another, and connecting servers follows the standard MCP server connection flow regardless of mode. Configuration details and the full meta-tool reference live in the Code Mode documentation.

When to Use Classic MCP vs Code Mode

Use Code Mode for 3 or more MCP servers, complex multi-step workflows, or any setup where token cost and latency matter. Keep classic MCP for 1 to 2 small servers with simple, direct tool calls. The two are not mutually exclusive: a practical pattern is Code Mode for heavy servers like web search, documents, and databases, and classic calling for small utilities.

Does Code Mode help with only one or two MCP servers?

Usually not enough to justify it. With a small tool catalog, the context overhead of classic MCP is low, and direct tool calls are simpler to debug. Code Mode's advantage compounds as the number of connected tools grows.

Is Code Mode slower than classic MCP?

Code Mode is typically faster, not slower. By collapsing several tool steps into one sandboxed script, it reduces LLM round trips by 3 to 4 times and cuts execution time by around 40% in large deployments. For latency-sensitive single-call workflows, classic MCP can still be the simpler choice.

Can classic MCP and Code Mode run together?

Yes. Code Mode is configured per MCP client, so the same Bifrost deployment can expose some clients in classic mode and others in Code Mode. This lets teams migrate heavy servers first while leaving simple utilities unchanged.

Is it safe to auto-execute Code Mode scripts?

Code Mode integrates with Agent Mode for autonomous execution, with extra validation. Bifrost parses the script, extracts every tool call, and auto-executes only if all calls are on the allowed list; if any call is not permitted, the request returns for approval. Combined with per-key MCP tool filtering, this keeps autonomous workflows inside defined boundaries, which matters most for the regulated and enterprise teams Bifrost is built for. The broader controls available when running Bifrost as an MCP gateway cover access control, audit logs, and per-tool cost tracking.

Getting Started with Bifrost

Choosing between classic MCP vs Code Mode comes down to scale. For a handful of tools, classic tool calling is clear and direct. For agents spanning many MCP servers, Code Mode keeps token costs and latency bounded as the catalog grows, while preserving task accuracy. Both run through the same gateway, and Code Mode is a per-client toggle once your servers are connected through the Bifrost setup flow. Model Context Protocol standards are defined by the MCP specification for teams that want to go deeper on the protocol itself.

To see how Bifrost handles MCP tool orchestration, governance, and Code Mode for production agents, book a demo with the Bifrost team, or explore the Bifrost resources hub for benchmarks and implementation guides.