Overview

Want to see Bifrost’s performance in your specific environment? The Bifrost Benchmarking Repository provides everything you need to conduct comprehensive performance tests tailored to your infrastructure and workload requirements. What You Can Test:
  • Custom Instance Sizes - Test on your preferred AWS/GCP/Azure instances
  • Your Workload Patterns - Use your actual request/response sizes
  • Different Configurations - Compare various Bifrost settings
  • Provider Comparisons - Benchmark against other AI gateways
  • Load Scenarios - Test burst loads, sustained traffic, and endurance
πŸ’‘ Open Source: The benchmarking tool is completely open source! Feel free to submit pull requests if you think anything is missing or could be improved.

Prerequisites

Before running benchmarks, ensure you have:
  • Go 1.23+ installed on your testing machine
  • Bifrost instance running and accessible
  • Target API providers configured (OpenAI, Anthropic, etc.)
  • Network access between benchmark tool and Bifrost
  • Sufficient resources on the testing machine to generate load

Quick Start

1. Clone the Repository

git clone https://github.com/maximhq/bifrost-benchmarking.git
cd bifrost-benchmarking

2. Build the Benchmark Tool

go build benchmark.go
This creates a benchmark executable (or benchmark.exe on Windows).

3. Run Your First Benchmark

# Basic benchmark: 500 RPS for 10 seconds
./benchmark -provider bifrost -port 8080

# Custom benchmark: 1000 RPS for 30 seconds  
./benchmark -provider bifrost -port 8080 -rate 1000 -duration 30 -output my_results.json

Configuration Options

The benchmark tool offers extensive configuration through command-line flags:

Basic Configuration

FlagRequiredDescriptionDefault
-provider <name>βœ…Provider name (e.g., bifrost, litellm)None
-port <number>βœ…Port number of your Bifrost instanceNone
-endpoint <path>❌API endpoint pathv1/chat/completions
-rate <number>❌Requests per second500
-duration <seconds>❌Test duration in seconds10
-output <filename>❌Results output fileresults.json

Advanced Configuration

FlagDescriptionDefault
-include-provider-in-requestInclude provider name in request payloadfalse
-big-payloadUse larger, more complex request payloadsfalse

Benchmark Scenarios

1. Basic Performance Test

Test standard performance with typical request sizes:
./benchmark -provider bifrost -port 8080 -rate 1000 -duration 60 -output basic_test.json
Use Case: General performance validation

2. High-Load Stress Test

Push your instance to its limits:
./benchmark -provider bifrost -port 8080 -rate 5000 -duration 120 -output stress_test.json
Use Case: Capacity planning and SLA validation

3. Large Payload Test

Test with bigger request/response sizes:
./benchmark -provider bifrost -port 8080 -rate 500 -duration 60 -big-payload=true -output large_payload.json
Use Case: Document processing, code generation workloads

4. Endurance Test

Long-running stability test:
./benchmark -provider bifrost -port 8080 -rate 1000 -duration 1800 -output endurance_test.json
Use Case: Production readiness validation (30-minute test)

5. Comparative Benchmarking

Compare Bifrost against other providers:
# Test Bifrost
./benchmark -provider bifrost -port 8080 -rate 1000 -duration 60 -output bifrost_results.json

# Test LiteLLM
./benchmark -provider litellm -port 8000 -rate 1000 -duration 60 -output litellm_results.json

# Test direct OpenAI (if available)
./benchmark -provider openai -port 443 -endpoint chat/completions -rate 1000 -duration 60 -output openai_results.json

Understanding Results

The benchmark tool generates detailed JSON results with comprehensive metrics:

Key Metrics Explained

{
  "bifrost": {
    "request_counts": {
      "total_sent": 30000,
      "successful": 30000,
      "failed": 0
    },
    "success_rate": 100.0,
    "latency_metrics": {
      "mean_ms": 245.5,
      "p50_ms": 230.2,
      "p99_ms": 520.8,
      "max_ms": 845.3
    },
    "throughput_rps": 5000.0,
    "memory_usage": {
      "before_mb": 512.5,
      "after_mb": 1312.8,
      "peak_mb": 1405.2,
      "average_mb": 1156.7
    },
    "timestamp": "2025-01-14T10:30:00Z",
    "status_codes": {
      "200": 30000
    }
  }
}

Critical Performance Indicators

Success Rate:
  • Target: >99.9% for production readiness
  • Excellent: 100% (perfect reliability)
Latency Metrics:
  • P50 (Median): Typical user experience
  • P99: Worst-case user experience
  • Mean: Overall average performance
Memory Usage:
  • Peak: Maximum memory consumption
  • Average: Sustained memory usage
  • After - Before: Memory growth during test

Instance Sizing Recommendations

Based on your benchmark results, use these guidelines for production sizing:

Resource Planning Matrix

Target RPSMemory UsageRecommended InstanceNotes
< 1,000< 1GBt3.smallCost-effective for light loads
1,000 - 3,0001-2GBt3.mediumBalanced performance/cost
3,000 - 5,0002-4GBt3.largeHigh-performance production
5,000+3-6GBt3.xlarge+Enterprise/mission-critical

Configuration Tuning Based on Results

If seeing high latency:
  • Increase initial_pool_size
  • Increase buffer_size
  • Consider larger instance
If memory usage is high:
  • Decrease initial_pool_size
  • Optimize buffer_size
  • Monitor for memory leaks
If success rate < 100%:
  • Reduce request rate
  • Increase timeout settings
  • Check provider limits

Advanced Testing Scenarios

Burst Load Testing

Simulate traffic spikes:
# Normal load
./benchmark -provider bifrost -port 8080 -rate 1000 -duration 300 -output normal_load.json

# Burst load (simulate 5x spike)
./benchmark -provider bifrost -port 8080 -rate 5000 -duration 60 -output burst_load.json

Multi-Instance Testing

Test horizontal scaling:
# Instance 1
./benchmark -provider bifrost-1 -port 8080 -rate 2500 -duration 120 -output instance_1.json &

# Instance 2  
./benchmark -provider bifrost-2 -port 8081 -rate 2500 -duration 120 -output instance_2.json &

# Wait for both to complete
wait

Different Payload Sizes

Compare performance across payload sizes:
# Small payloads (default)
./benchmark -provider bifrost -port 8080 -rate 1000 -duration 60 -output small_payload.json

# Large payloads
./benchmark -provider bifrost -port 8080 -rate 1000 -duration 60 -big-payload=true -output large_payload.json

Continuous Benchmarking

Automated Testing Pipeline

Set up regular performance regression testing:
#!/bin/bash
# daily_benchmark.sh

DATE=$(date +%Y%m%d_%H%M%S)
OUTPUT_DIR="benchmarks/$DATE"
mkdir -p $OUTPUT_DIR

# Run standard benchmarks
./benchmark -provider bifrost -port 8080 -rate 1000 -duration 300 -output "$OUTPUT_DIR/standard.json"
./benchmark -provider bifrost -port 8080 -rate 3000 -duration 180 -output "$OUTPUT_DIR/high_load.json"  
./benchmark -provider bifrost -port 8080 -rate 500 -duration 600 -big-payload=true -output "$OUTPUT_DIR/large_payload.json"

echo "Benchmarks completed: $OUTPUT_DIR"

Performance Monitoring Integration

Monitor key metrics over time:
  • Success rate trends
  • Latency percentile changes
  • Memory usage patterns
  • Throughput capacity

Troubleshooting

Common Issues

Connection Refused:
# Check if Bifrost is running
curl http://localhost:8080/health

# Verify port configuration
netstat -an | grep 8080
  • Check PORT is defined in .env file at root.
High Error Rates:
  • Check provider API key limits
  • Verify Bifrost configuration
  • Monitor upstream provider status
  • Reduce request rate for baseline test
Memory Issues:
  • Monitor system resources during testing
  • Check for memory leaks in long tests
  • Adjust Bifrost pool sizes
Inconsistent Results:
  • Run multiple test iterations
  • Account for network variability
  • Use longer test durations (60+ seconds)
  • Isolate testing environment
  • Try hitting gateway requests to a Mock provider

Next Steps

After Running Benchmarks

  1. Analyze Results: Compare against official benchmarks
  2. Optimize Configuration: Tune based on your specific results
  3. Plan Capacity: Size instances based on measured performance
  4. Set Up Monitoring: Track key metrics in production

Compare Results

Ready to benchmark? Clone the repository and start testing!