Architecture Navigation
Core Architecture
| Document | Description | Focus Area |
|---|---|---|
| System Overview | High-level architecture & design principles | Components, interactions, data flow |
| Request Flow | Request processing pipeline deep dive | Processing stages, memory management |
| Benchmarks | Performance benchmarks & optimization | Metrics, scaling, optimization |
| Concurrency | Worker pools & threading model | Goroutines, channels, resource isolation |
Internal Systems
| Document | Description | Focus Area |
|---|---|---|
| Plugin System | How plugins work internally | Plugin lifecycle, interfaces, execution |
| MCP System | Model Context Protocol internals | Tool discovery, execution, integration |
| Design Decisions | Architecture rationale & trade-offs | Why we built it this way, alternatives |
Architecture at a Glance
High-Performance Design Principles
- Asynchronous Processing - Channel-based worker pools eliminate blocking
- Memory Pool Management - Object reuse minimizes garbage collection
- Provider Isolation - Independent resources prevent cascade failures
- Plugin-First Architecture - Extensible without core modifications
- Connection Optimization - HTTP/2, keep-alive, intelligent pooling
System Components Overview
Processing Flow: Transport → Router → Plugins → MCP → Workers → ProvidersKey Performance Characteristics
| Metric | Performance | Details |
|---|---|---|
| Throughput | 10,000+ RPS | Sustained high-load performance |
| Latency | 11-59μs overhead | Minimal processing overhead |
| Memory | Optimized pooling | Object reuse minimizes GC pressure |
| Reliability | 100% success rate | Under 5000 RPS sustained load |
Architectural Features
- Provider Isolation - Independent worker pools prevent cascade failures
- Memory Optimization - Channel, message, and response object pooling
- Extensible Hooks - Plugin system for custom logic injection
- MCP Integration - Native tool discovery and execution system
- Built-in Observability - Prometheus metrics without external dependencies
Core Concepts
Request Lifecycle
- Transport receives request (HTTP/SDK)
- Router selects provider and manages load balancing
- Plugin Manager executes pre-processing hooks
- MCP Manager discovers and prepares available tools
- Worker Pool processes request with dedicated provider workers
- Memory Pools provide reusable objects for efficiency
- Plugin Manager executes post-processing hooks
- Transport returns response to client
Scaling Strategies
- Vertical Scaling - Increase pool sizes and buffer capacities
- Horizontal Scaling - Deploy multiple instances with load balancing
- Provider Scaling - Independent worker pools per provider
- Memory Scaling - Configurable object pool sizes
Extension Points
- Plugin Hooks - Pre/post request processing
- Custom Providers - Add new AI service integrations
- MCP Tools - External tool integration
- Transport Layers - Multiple interface options (HTTP, SDK, gRPC planned)