Test Environment

1. t3.medium (2 vCPUs, 4GB RAM)

  • Buffer Size: 15,000
  • Initial Pool Size: 10,000

2. t3.xlarge (4 vCPUs, 16GB RAM)

  • Buffer Size: 20,000
  • Initial Pool Size: 15,000

Performance Metrics

Metrict3.mediumt3.xlarge
Success Rate100.00%100.00%
Average Request Size0.13 KB0.13 KB
Average Response Size1.37 KB10.32 KB
Average Latency2.12s1.61s
Peak Memory Usage1312.79 MB3340.44 MB
Queue Wait Time47.13 µs1.67 µs
Key Selection Time16 ns10 ns
Message Formatting2.19 µs2.11 µs
Params Preparation436 ns417 ns
Request Body Preparation2.65 µs2.36 µs
JSON Marshaling63.47 µs26.80 µs
Request Setup6.59 µs7.17 µs
HTTP Request1.56s1.50s
Error Handling189 ns162 ns
Response Parsing11.30 ms2.11 ms
Bifrost’s Overhead59 µs\*11 µs\*
*Bifrost’s overhead is measured at 59 µs on t3.medium and 11 µs on t3.xlarge, excluding the time taken for JSON marshalling and the HTTP call to the LLM, both of which are required in any custom implementation. Note: On the t3.xlarge, we tested with significantly larger response payloads (~10 KB average vs ~1 KB on t3.medium). Even so, response parsing time dropped dramatically thanks to better CPU throughput and Bifrost’s optimized memory reuse.

Key Performance Highlights

  • Perfect Success Rate: 100% request success rate under high load on both instances
  • Total Overhead: Less than only 15µs added per request on average
  • Efficient Queue Management: Minimal queue wait time (1.67 µs on t3.xlarge)
  • Fast Key Selection: Near-instantaneous key selection (10 ns on t3.xlarge)
  • Improved Performance on t3.xlarge:
    • 24% faster average latency
    • 81% faster response parsing
    • 58% faster JSON marshaling
    • Significantly reduced queue wait times

Configuration Flexibility

One of Bifrost’s key strengths is its flexibility in configuration. You can freely decide the tradeoff between memory usage and processing speed by adjusting Bifrost’s configurations. This flexibility allows you to optimize Bifrost for your specific use case, whether you prioritize speed, memory efficiency, or a balance between the two.
  • Higher buffer and pool sizes (like in t3.xlarge) improve speed but use more memory
  • Lower configurations (like in t3.medium) use less memory but may have slightly higher latencies
  • You can fine-tune these parameters based on your specific needs and available resources

Key Configuration Parameters

  • Initial Pool Size: Determines the initial allocation of resources
  • Buffer and Concurrency Settings: Controls the queue size and maximum number of concurrent requests (adjustable per provider)
  • Retry and Timeout Configurations: Customizable based on your requirements for each provider

Run Your Own Benchmarks

Curious? Run your own benchmarks. The Bifrost Benchmarking repo has everything you need to test it in your own environment. Curious how we handle scales of 10k+ RPS? Check out our System Architecture Documentation for detailed insights into Bifrost’s high-performance design, memory management, and scaling strategies.