Overview

Intelligent Load Balancing in Bifrost automatically optimizes traffic distribution across provider keys and models based on real-time performance metrics. The system continuously monitors error rates, latency, and throughput to dynamically adjust weights, ensuring optimal performance and reliability.

Key Features

FeatureDescription
Dynamic Weight AdjustmentAutomatically adjusts key weights based on performance metrics
Real-time Performance MonitoringTracks error rates, latency, and success rates per model-key combination
Cross-Node SynchronizationGossip protocol ensures consistent weight information across all cluster nodes
Predictive ScalingAnticipates traffic patterns and adjusts weights proactively
Circuit Breaker IntegrationTemporarily removes poorly performing keys from rotation
Model-Level OptimizationOptimizes performance at both provider and individual model levels

How Intelligent Load Balancing Works

Performance Metrics Collection

The system continuously collects performance data for each model-key combination:
{
  "provider": "openai",
  "model_key_id": "gpt-4-key-1",
  "metrics": {
    "avg_latency_ms": 1200,
    "error_rate": 0.01,
    "success_rate": 0.99,
    "requests_per_minute": 362,
    "tokens_processed": 87500,
    "current_weight": 0.8,
    "baseline_latency_ms": 980,
    "performance_score": 0.85
  }
}

Weight Adjustment Algorithm

The intelligent load balancer automatically adjusts weights based on real-time performance metrics:
  • High Error Rates: Reduces weight for keys with elevated error rates
  • Latency Spikes: Decreases weight when response times exceed baseline thresholds
  • Superior Performance: Increases weight for consistently high-performing keys
  • Gradual Adjustments: Makes incremental changes to prevent traffic oscillation

Real-Time Weight Synchronization

In clustered deployments, weight adjustments are synchronized across all nodes using the gossip protocol:

Weight Update Message Format

{
  "version": 1,
  "type": "weight_update",
  "node_id": "bifrost-node-b",
  "timestamp": "2024-01-15T10:30:15Z",
  "data": {
    "provider": "openai",
    "model_key_id": "gpt-4-key-2",
    "weight_change": {
      "from": 0.8,
      "to": 0.6,
      "reason": "high_error_rate",
      "threshold_exceeded": 0.025,
      "adjustment_factor": 0.75
    },
    "performance_metrics": {
      "avg_latency_ms": 1450,
      "baseline_latency_ms": 1100,
      "error_rate": 0.03,
      "success_rate": 0.97,
      "requests_count": 150,
      "performance_score": 0.72
    },
    "next_evaluation": "2024-01-15T10:31:15Z"
  }
}

Performance Monitoring & Alerting

Key Performance Indicators

The system tracks these critical metrics for each model-key combination:
MetricThresholdAction
Error Rate> 2.5%Reduce weight by 30%
Latency Spike> 150% baselineReduce weight by 20%
Success Rate< 95%Circuit breaker activation
Response Time> 5000msTemporary removal from pool
Throughput Drop< 50% expectedWeight adjustment

Automatic Performance Alerts

{
  "version": 1,
  "type": "performance_alert",
  "node_id": "bifrost-node-c",
  "timestamp": "2024-01-15T10:31:00Z",
  "data": {
    "alert_type": "latency_spike",
    "severity": "warning",
    "provider": "anthropic",
    "model_key_id": "claude-3-key-1",
    "current_metrics": {
      "avg_latency_ms": 2800,
      "baseline_latency_ms": 980,
      "spike_percentage": 185.7,
      "error_rate": 0.008,
      "current_weight": 1.0
    },
    "recommended_action": "reduce_weight",
    "suggested_new_weight": 0.7,
    "auto_applied": true
  }
}

Configuration

Basic Intelligent Load Balancing Setup

{
  "intelligent_load_balancing": {
    "enabled": true,
    "algorithm": "adaptive_weighted",
    "evaluation_interval": "30s",
    "weight_adjustment": {
      "enabled": true,
      "max_change_per_cycle": 0.3,
      "min_weight": 0.1,
      "max_weight": 2.0
    },
    "performance_thresholds": {
      "error_rate_warning": 0.02,
      "error_rate_critical": 0.05,
      "latency_spike_threshold": 1.5,
      "circuit_breaker_threshold": 0.95
    }
  }
}

Advanced Configuration

{
  "intelligent_load_balancing": {
    "enabled": true,
    "algorithm": "adaptive_weighted",
    "evaluation_interval": "30s",
    "weight_adjustment": {
      "enabled": true,
      "strategy": "performance_based",
      "max_change_per_cycle": 0.3,
      "min_weight": 0.1,
      "max_weight": 2.0,
      "adjustment_factors": {
        "error_rate_penalty": 0.7,
        "latency_penalty": 0.8,
        "performance_bonus": 1.1
      }
    },
    "performance_thresholds": {
      "error_rate_warning": 0.02,
      "error_rate_critical": 0.05,
      "latency_spike_threshold": 1.5,
      "latency_critical_threshold": 2.0,
      "circuit_breaker_threshold": 0.95,
      "recovery_threshold": 0.98
    },
    "metrics_collection": {
      "window_size": "5m",
      "sample_rate": "1s",
      "baseline_calculation": "rolling_average_7d"
    },
    "predictive_scaling": {
      "enabled": true,
      "prediction_window": "15m",
      "confidence_threshold": 0.8,
      "proactive_adjustments": true
    }
  }
}

Provider-Specific Configuration

{
  "providers": [
    {
      "id": "openai",
      "keys": [
        {
          "key": "sk-...",
          "model_key_id": "gpt-4-key-1",
          "weight": 1.0,
          "intelligent_balancing": {
            "enabled": true,
            "baseline_latency_ms": 1100,
            "expected_error_rate": 0.01,
            "max_requests_per_minute": 500,
            "priority": "high"
          }
        },
        {
          "key": "sk-...",
          "model_key_id": "gpt-4-key-2",
          "weight": 0.8,
          "intelligent_balancing": {
            "enabled": true,
            "baseline_latency_ms": 1200,
            "expected_error_rate": 0.015,
            "max_requests_per_minute": 400,
            "priority": "medium"
          }
        }
      ]
    }
  ]
}

Traffic Distribution Examples

Before Intelligent Load Balancing

{
  "provider": "openai",
  "traffic_distribution": {
    "gpt-4-key-1": {
      "weight": 1.0,
      "traffic_percentage": 50.0,
      "avg_latency_ms": 1450,
      "error_rate": 0.03,
      "status": "degraded_performance"
    },
    "gpt-4-key-2": {
      "weight": 1.0,
      "traffic_percentage": 50.0,
      "avg_latency_ms": 1100,
      "error_rate": 0.01,
      "status": "healthy"
    }
  }
}

After Intelligent Load Balancing

{
  "provider": "openai",
  "traffic_distribution": {
    "gpt-4-key-1": {
      "weight": 0.6,
      "traffic_percentage": 35.3,
      "avg_latency_ms": 1450,
      "error_rate": 0.03,
      "status": "weight_reduced",
      "adjustment_reason": "high_error_rate_and_latency"
    },
    "gpt-4-key-2": {
      "weight": 1.1,
      "traffic_percentage": 64.7,
      "avg_latency_ms": 1100,
      "error_rate": 0.01,
      "status": "weight_increased",
      "adjustment_reason": "superior_performance"
    }
  },
  "overall_improvement": {
    "avg_latency_reduction": "12.3%",
    "error_rate_reduction": "23.1%",
    "throughput_increase": "8.7%"
  }
}

Monitoring Dashboard

Real-Time Performance View

Monitor intelligent load balancing effectiveness through these key metrics:
{
  "intelligent_load_balancing_metrics": {
    "last_evaluation": "2024-01-15T10:30:00Z",
    "next_evaluation": "2024-01-15T10:30:30Z",
    "total_adjustments_last_hour": 12,
    "performance_improvements": {
      "latency_improvement": "15.2%",
      "error_rate_reduction": "28.4%",
      "throughput_increase": "11.8%"
    },
    "provider_performance": {
      "openai": {
        "total_keys": 3,
        "healthy_keys": 2,
        "degraded_keys": 1,
        "avg_weight": 0.83,
        "traffic_distribution": {
          "gpt-4-key-1": {
            "weight": 0.6,
            "traffic_percentage": 28.5,
            "performance_score": 0.72,
            "trend": "declining"
          },
          "gpt-4-key-2": {
            "weight": 1.1,
            "traffic_percentage": 52.3,
            "performance_score": 0.94,
            "trend": "stable"
          },
          "gpt-4-key-3": {
            "weight": 0.9,
            "traffic_percentage": 19.2,
            "performance_score": 0.87,
            "trend": "improving"
          }
        }
      },
      "anthropic": {
        "total_keys": 2,
        "healthy_keys": 2,
        "degraded_keys": 0,
        "avg_weight": 1.05,
        "traffic_distribution": {
          "claude-3-key-1": {
            "weight": 1.0,
            "traffic_percentage": 48.2,
            "performance_score": 0.91,
            "trend": "stable"
          },
          "claude-3-key-2": {
            "weight": 1.1,
            "traffic_percentage": 51.8,
            "performance_score": 0.95,
            "trend": "improving"
          }
        }
      }
    }
  }
}