Overview

Bifrost provides three primary knobs for tuning performance and memory consumption:
  • Concurrency (concurrency): Controls the number of simultaneous requests to each provider.
  • Request Buffering (buffer_size): Defines the queue size for pending requests for each provider.
  • Object Pooling (initial_pool_size): Pre-allocates memory for request/response objects to reduce garbage collection overhead.
Understanding how these settings interact is key to configuring Bifrost for high throughput, low latency, or resource-constrained environments.

1. Concurrency Control (concurrency)

Concurrency determines how many worker goroutines are spawned for each provider to process requests in parallel.
  • What it is: The maximum number of simultaneous requests Bifrost will make to a single provider’s API.
  • Impact: Directly controls the throughput for each provider.
  • Trade-offs:
    • Higher Concurrency: Increases throughput but also increases the risk of hitting API rate limits. Consumes more memory and CPU for in-flight requests.
    • Lower Concurrency: Reduces the risk of rate limiting and consumes fewer resources, but may limit throughput.
  • Configuration: This is configured on a per-provider basis. 🔧 Go Package - Concurrency Configuration Concurrency is set within the ProviderConfig returned by your Account implementation.
    // In your Account implementation
    func (a *MyAccount) GetConfigForProvider(provider schemas.ModelProvider) (*schemas.ProviderConfig, error) {
        // ...
        return &schemas.ProviderConfig{
            ConcurrencyAndBufferSize: schemas.ConcurrencyAndBufferSize{
                Concurrency: 10, // 10 concurrent workers for this provider
                BufferSize:  50,
            },
            // ...
        }, nil
    }
    
    🌐 HTTP Transport - Concurrency Configuration Concurrency is set in your config.json under each provider’s concurrency_and_buffer_size.
    {
        "providers": {
            "openai": {
                // ...
                "concurrency_and_buffer_size": {
                    "concurrency": 10,
                    "buffer_size": 50
                }
            }
        }
    }
    

2. Request Queuing (buffer_size)

The buffer is a queue that holds incoming requests waiting to be processed by the concurrent workers.
  • What it is: The number of requests that can be queued for a provider before new requests either block or are dropped.
  • Impact: Helps Bifrost absorb traffic bursts without losing requests.
  • Trade-offs:
    • Larger Buffer: Can handle larger bursts of traffic, preventing blocking. However, it consumes more memory to hold the queued request objects.
    • Smaller Buffer: Consumes less memory but may cause requests to block or be dropped during traffic spikes if workers can’t keep up.
  • dropExcessRequests: If the buffer is full, the behavior depends on the global dropExcessRequests setting (Go package only).
    • false (default): New requests will block until space is available in the queue.
    • true: New requests are immediately dropped with an error.
    🔧 Go Package - Buffer Configuration The buffer size is set alongside concurrency in ProviderConfig.
    // In your Account implementation
    func (a *MyAccount) GetConfigForProvider(provider schemas.ModelProvider) (*schemas.ProviderConfig, error) {
        // ...
        return &schemas.ProviderConfig{
            ConcurrencyAndBufferSize: schemas.ConcurrencyAndBufferSize{
                Concurrency: 10,
                BufferSize:  50, // Queue up to 50 requests
            },
            // ...
        }, nil
    }
    
    // Global config for dropping excess requests
    bifrost, err := bifrost.Init(schemas.BifrostConfig{
        //...
        DropExcessRequests: true, // Drop requests when queue is full
    })
    
    🌐 HTTP Transport - Buffer Configuration The buffer size is set in your config.json. Note that dropExcessRequests is not configurable for the HTTP transport and defaults to false (blocking).
    {
        "providers": {
            "openai": {
                // ...
                "concurrency_and_buffer_size": {
                    "concurrency": 10,
                    "buffer_size": 50
                }
            }
        }
    }
    

3. Object Pooling (initial_pool_size)

Bifrost uses object pools to reuse request and response objects, reducing the load on the garbage collector and improving latency.
  • What it is: A global setting that pre-allocates a specified number of objects for requests, responses, and errors.
  • Impact: Significantly reduces memory allocation and GC pressure during high-traffic scenarios.
  • Trade-offs:
    • Larger Pool: Improves performance under heavy load by minimizing allocations. Increases the initial memory footprint of Bifrost.
    • Smaller Pool: Lower initial memory usage, but may lead to more GC activity and higher latency under load.
  • Configuration: This is a global setting. For the Go package, it is set in BifrostConfig. For the HTTP transport, it’s configured via command-line flags or environment variables, not in config.json.
🔧 Go Package - Object Pool Configuration Set InitialPoolSize in the BifrostConfig during initialization.
// Global config for object pooling
bifrost, err := bifrost.Init(schemas.BifrostConfig{
    Account:        myAccount,
    InitialPoolSize: 1000, // Pre-allocate 1000 objects of each type
    // ...
})
🌐 HTTP Transport - Object Pool Configuration The pool size for the HTTP transport is set at startup. Using Go Binary Use the -pool-size command-line flag.
bifrost-http -config config.json -port 8080 -pool-size 1000
Using Docker Use the APP_POOL_SIZE environment variable.
docker run -p 8080:8080 \
  -v $(pwd)/config.json:/app/config/config.json \
  -e APP_POOL_SIZE=1000 \
  -e OPENAI_API_KEY \
  maximhq/bifrost

✨ Future Development

Dynamic Scaling

Note: This feature is under active development.
A planned feature for Bifrost is dynamic scaling, which will allow concurrency and buffer_size to adjust automatically based on real-time request load and provider feedback (like rate-limit headers). This will enable Bifrost to smartly self-tune for optimal performance and cost-efficiency.

⚙️ Configuration Recommendations

Tune these settings based on your application’s traffic patterns and performance goals.
Use CaseConcurrency (per provider)Buffer Size (per provider)Initial Pool Size (global)Goal
🚀 High-Throughput50-200500-10001000-5000Maximize RPS, assuming provider rate limits are high.
⚖️ Balanced (Default)10-50100-500500-1000Good for most production workloads with moderate traffic.
💧 Burst-Resistant10-201000-5000500-1000Handles sudden traffic spikes without dropping requests.
🌱 Resource-Constrained2-510-5050-100Minimizes memory footprint for development or low-traffic services.

📊 Monitoring Memory

Monitor your Bifrost instance to ensure your configuration is optimal.
  • Prometheus Metrics: The HTTP transport exposes metrics at the /metrics endpoint. While there are no specific memory metrics, you can monitor go_memstats_* to observe memory usage.
  • Go Profiling (pprof): For detailed memory analysis when using the Go package, use the standard net/http/pprof tool to inspect heap allocations and goroutine counts.
💡 Tip: Start with the Balanced configuration and adjust based on observed performance and resource utilization. For example, if you see requests blocking frequently, increase buffer_size. If your provider rate limits are being hit, decrease concurrency.