Intelligent Load Balancing

Overview

Intelligent Load Balancing in Bifrost automatically optimizes traffic distribution across provider keys and models based on real-time performance metrics. The system continuously monitors error rates, latency, and throughput to dynamically adjust weights, ensuring optimal performance and reliability.

Key Features

Feature	Description
Dynamic Weight Adjustment	Automatically adjusts key weights based on performance metrics
Real-time Performance Monitoring	Tracks error rates, latency, and success rates per model-key combination
Cross-Node Synchronization	Gossip protocol ensures consistent weight information across all cluster nodes
Predictive Scaling	Anticipates traffic patterns and adjusts weights proactively
Circuit Breaker Integration	Temporarily removes poorly performing keys from rotation
Model-Level Optimization	Optimizes performance at both provider and individual model levels

How Intelligent Load Balancing Works

Performance Metrics Collection

The system continuously collects performance data for each model-key combination:

{
  "provider": "openai",
  "model_key_id": "gpt-4-key-1",
  "metrics": {
    "avg_latency_ms": 1200,
    "error_rate": 0.01,
    "success_rate": 0.99,
    "requests_per_minute": 362,
    "tokens_processed": 87500,
    "current_weight": 0.8,
    "baseline_latency_ms": 980,
    "performance_score": 0.85
  }
}

Weight Adjustment Algorithm

The intelligent load balancer automatically adjusts weights based on real-time performance metrics:

High Error Rates: Reduces weight for keys with elevated error rates
Latency Spikes: Decreases weight when response times exceed baseline thresholds
Superior Performance: Increases weight for consistently high-performing keys
Gradual Adjustments: Makes incremental changes to prevent traffic oscillation

Real-Time Weight Synchronization

In clustered deployments, weight adjustments are synchronized across all nodes using the gossip protocol:

Weight Update Message Format

{
  "version": 1,
  "type": "weight_update",
  "node_id": "bifrost-node-b",
  "timestamp": "2024-01-15T10:30:15Z",
  "data": {
    "provider": "openai",
    "model_key_id": "gpt-4-key-2",
    "weight_change": {
      "from": 0.8,
      "to": 0.6,
      "reason": "high_error_rate",
      "threshold_exceeded": 0.025,
      "adjustment_factor": 0.75
    },
    "performance_metrics": {
      "avg_latency_ms": 1450,
      "baseline_latency_ms": 1100,
      "error_rate": 0.03,
      "success_rate": 0.97,
      "requests_count": 150,
      "performance_score": 0.72
    },
    "next_evaluation": "2024-01-15T10:31:15Z"
  }
}

Performance Monitoring & Alerting

Key Performance Indicators

The system tracks these critical metrics for each model-key combination:

Metric	Threshold	Action
Error Rate	> 2.5%	Reduce weight by 30%
Latency Spike	> 150% baseline	Reduce weight by 20%
Success Rate	< 95%	Circuit breaker activation
Response Time	> 5000ms	Temporary removal from pool
Throughput Drop	< 50% expected	Weight adjustment

Automatic Performance Alerts

{
  "version": 1,
  "type": "performance_alert",
  "node_id": "bifrost-node-c",
  "timestamp": "2024-01-15T10:31:00Z",
  "data": {
    "alert_type": "latency_spike",
    "severity": "warning",
    "provider": "anthropic",
    "model_key_id": "claude-3-key-1",
    "current_metrics": {
      "avg_latency_ms": 2800,
      "baseline_latency_ms": 980,
      "spike_percentage": 185.7,
      "error_rate": 0.008,
      "current_weight": 1.0
    },
    "recommended_action": "reduce_weight",
    "suggested_new_weight": 0.7,
    "auto_applied": true
  }
}

Configuration

Basic Intelligent Load Balancing Setup

{
  "intelligent_load_balancing": {
    "enabled": true,
    "algorithm": "adaptive_weighted",
    "evaluation_interval": "30s",
    "weight_adjustment": {
      "enabled": true,
      "max_change_per_cycle": 0.3,
      "min_weight": 0.1,
      "max_weight": 2.0
    },
    "performance_thresholds": {
      "error_rate_warning": 0.02,
      "error_rate_critical": 0.05,
      "latency_spike_threshold": 1.5,
      "circuit_breaker_threshold": 0.95
    }
  }
}

Advanced Configuration

{
  "intelligent_load_balancing": {
    "enabled": true,
    "algorithm": "adaptive_weighted",
    "evaluation_interval": "30s",
    "weight_adjustment": {
      "enabled": true,
      "strategy": "performance_based",
      "max_change_per_cycle": 0.3,
      "min_weight": 0.1,
      "max_weight": 2.0,
      "adjustment_factors": {
        "error_rate_penalty": 0.7,
        "latency_penalty": 0.8,
        "performance_bonus": 1.1
      }
    },
    "performance_thresholds": {
      "error_rate_warning": 0.02,
      "error_rate_critical": 0.05,
      "latency_spike_threshold": 1.5,
      "latency_critical_threshold": 2.0,
      "circuit_breaker_threshold": 0.95,
      "recovery_threshold": 0.98
    },
    "metrics_collection": {
      "window_size": "5m",
      "sample_rate": "1s",
      "baseline_calculation": "rolling_average_7d"
    },
    "predictive_scaling": {
      "enabled": true,
      "prediction_window": "15m",
      "confidence_threshold": 0.8,
      "proactive_adjustments": true
    }
  }
}

Provider-Specific Configuration

{
  "providers": [
    {
      "id": "openai",
      "keys": [
        {
          "key": "sk-...",
          "model_key_id": "gpt-4-key-1",
          "weight": 1.0,
          "intelligent_balancing": {
            "enabled": true,
            "baseline_latency_ms": 1100,
            "expected_error_rate": 0.01,
            "max_requests_per_minute": 500,
            "priority": "high"
          }
        },
        {
          "key": "sk-...",
          "model_key_id": "gpt-4-key-2",
          "weight": 0.8,
          "intelligent_balancing": {
            "enabled": true,
            "baseline_latency_ms": 1200,
            "expected_error_rate": 0.015,
            "max_requests_per_minute": 400,
            "priority": "medium"
          }
        }
      ]
    }
  ]
}

Traffic Distribution Examples

Before Intelligent Load Balancing

{
  "provider": "openai",
  "traffic_distribution": {
    "gpt-4-key-1": {
      "weight": 1.0,
      "traffic_percentage": 50.0,
      "avg_latency_ms": 1450,
      "error_rate": 0.03,
      "status": "degraded_performance"
    },
    "gpt-4-key-2": {
      "weight": 1.0,
      "traffic_percentage": 50.0,
      "avg_latency_ms": 1100,
      "error_rate": 0.01,
      "status": "healthy"
    }
  }
}

After Intelligent Load Balancing

{
  "provider": "openai",
  "traffic_distribution": {
    "gpt-4-key-1": {
      "weight": 0.6,
      "traffic_percentage": 35.3,
      "avg_latency_ms": 1450,
      "error_rate": 0.03,
      "status": "weight_reduced",
      "adjustment_reason": "high_error_rate_and_latency"
    },
    "gpt-4-key-2": {
      "weight": 1.1,
      "traffic_percentage": 64.7,
      "avg_latency_ms": 1100,
      "error_rate": 0.01,
      "status": "weight_increased",
      "adjustment_reason": "superior_performance"
    }
  },
  "overall_improvement": {
    "avg_latency_reduction": "12.3%",
    "error_rate_reduction": "23.1%",
    "throughput_increase": "8.7%"
  }
}

Monitoring Dashboard

Real-Time Performance View

Monitor intelligent load balancing effectiveness through these key metrics:

{
  "intelligent_load_balancing_metrics": {
    "last_evaluation": "2024-01-15T10:30:00Z",
    "next_evaluation": "2024-01-15T10:30:30Z",
    "total_adjustments_last_hour": 12,
    "performance_improvements": {
      "latency_improvement": "15.2%",
      "error_rate_reduction": "28.4%",
      "throughput_increase": "11.8%"
    },
    "provider_performance": {
      "openai": {
        "total_keys": 3,
        "healthy_keys": 2,
        "degraded_keys": 1,
        "avg_weight": 0.83,
        "traffic_distribution": {
          "gpt-4-key-1": {
            "weight": 0.6,
            "traffic_percentage": 28.5,
            "performance_score": 0.72,
            "trend": "declining"
          },
          "gpt-4-key-2": {
            "weight": 1.1,
            "traffic_percentage": 52.3,
            "performance_score": 0.94,
            "trend": "stable"
          },
          "gpt-4-key-3": {
            "weight": 0.9,
            "traffic_percentage": 19.2,
            "performance_score": 0.87,
            "trend": "improving"
          }
        }
      },
      "anthropic": {
        "total_keys": 2,
        "healthy_keys": 2,
        "degraded_keys": 0,
        "avg_weight": 1.05,
        "traffic_distribution": {
          "claude-3-key-1": {
            "weight": 1.0,
            "traffic_percentage": 48.2,
            "performance_score": 0.91,
            "trend": "stable"
          },
          "claude-3-key-2": {
            "weight": 1.1,
            "traffic_percentage": 51.8,
            "performance_score": 0.95,
            "trend": "improving"
          }
        }
      }
    }
  }
}

Quick Start

Integrations

Open Source Features

Enterprise Features

Intelligent Load Balancing

Overview

Key Features

How Intelligent Load Balancing Works

Performance Metrics Collection

Weight Adjustment Algorithm

Real-Time Weight Synchronization

Weight Update Message Format

Performance Monitoring & Alerting

Key Performance Indicators

Automatic Performance Alerts

Configuration

Basic Intelligent Load Balancing Setup

Advanced Configuration

Provider-Specific Configuration

Traffic Distribution Examples

Before Intelligent Load Balancing

After Intelligent Load Balancing

Monitoring Dashboard

Real-Time Performance View

Quick Start

Integrations

Open Source Features

Enterprise Features

​Overview

​Key Features

​How Intelligent Load Balancing Works

​Performance Metrics Collection

​Weight Adjustment Algorithm

​Real-Time Weight Synchronization

​Weight Update Message Format

​Performance Monitoring & Alerting

​Key Performance Indicators

​Automatic Performance Alerts

​Configuration

​Basic Intelligent Load Balancing Setup

​Advanced Configuration

​Provider-Specific Configuration

​Traffic Distribution Examples

​Before Intelligent Load Balancing

​After Intelligent Load Balancing

​Monitoring Dashboard

​Real-Time Performance View

Overview

Key Features

How Intelligent Load Balancing Works

Performance Metrics Collection

Weight Adjustment Algorithm

Real-Time Weight Synchronization

Weight Update Message Format

Performance Monitoring & Alerting

Key Performance Indicators

Automatic Performance Alerts

Configuration

Basic Intelligent Load Balancing Setup

Advanced Configuration

Provider-Specific Configuration

Traffic Distribution Examples

Before Intelligent Load Balancing

After Intelligent Load Balancing

Monitoring Dashboard

Real-Time Performance View