Complete Guide to LLM Logging, OTEL Tracing, and Observability in Bifrost

Complete Guide to LLM Logging, OTEL Tracing, and Observability in Bifrost

Learn how Bifrost delivers LLM observability with built-in logging, native OpenTelemetry (OTEL) tracing, Prometheus metrics, and multi-backend support.

LLM observability has become a non-negotiable requirement for any team running AI workloads in production. Once requests start fanning out across multiple providers, models, and teams, the inputs, outputs, token counts, latencies, and costs that determine reliability and unit economics become invisible without a dedicated telemetry layer. Bifrost, the open-source AI gateway by Maxim AI, treats LLM observability as a first-class concern. Every request that passes through the gateway is logged, traced, and measured using OpenTelemetry (OTEL) semantic conventions, with native Prometheus metrics and direct integrations into Grafana, Datadog, New Relic, Honeycomb, and other OTLP-compatible backends. This guide walks through how LLM logging, OTEL tracing, and metrics work inside Bifrost and how to configure them for production.

What is LLM Observability

LLM observability is the practice of capturing, structuring, and analyzing every signal a large language model application emits, including prompts, completions, token usage, latency, cost, errors, and tool calls. It extends traditional application performance monitoring with model-specific dimensions such as provider, model version, temperature, prompt template, and embedding operations. The goal is to make AI request behavior measurable, debuggable, and attributable across services, teams, and providers.

Effective LLM observability rests on three signals:

  • Logs: Full request and response payloads with metadata such as tenant ID, virtual key, prompt version, and retry trail.
  • Traces: Distributed spans that capture the full lifecycle of an LLM call, including upstream provider latency, retries, fallbacks, and tool execution.
  • Metrics: Aggregated counters and histograms for tokens, cost, cache hits, errors, and time-to-first-token.

Why LLM Observability Matters at the Gateway Layer

Capturing telemetry inside individual application services creates fragmentation. Each service maintains its own logging stack, its own cost dashboard, and its own provider client code. Anomalies in latency, spend, or model drift become impossible to correlate across the platform. An AI gateway sits between every application and every provider, which makes it the natural location for unified observability.

Bifrost captures telemetry at the gateway, which means platform teams get:

  • Centralized log streams for every LLM request, regardless of which service or developer initiated it.
  • Cost attribution by team, tenant, virtual key, or project without instrumenting application code.
  • Provider-level latency and error visibility across the full provider portfolio.
  • Compliance-ready audit trails for SOC 2, HIPAA, GDPR, and ISO 27001 reporting through Bifrost's audit logs.

This architectural placement also feeds back into governance. The same telemetry that powers dashboards also drives virtual key budgets, rate limits, and access controls, turning observability data into enforceable policy. The Bifrost governance resource covers how telemetry and policy reinforce each other in regulated environments.

How Bifrost's Built-In LLM Logging Works

Bifrost ships with built-in observability that automatically captures every AI request and response in real time, with zero changes to application code. The logging plugin operates asynchronously, with all database writes handled in background goroutines using sync.Pool memory optimization. In benchmarks, the logging plugin adds less than 0.1 ms of overhead per request, consistent with Bifrost's broader 11-microsecond gateway overhead at 5,000 RPS.

Each log entry captures:

  • Input messages: Full conversation history, prompts, and parameters such as temperature, max_tokens, and stop sequences.
  • Provider and model context: Which provider served the request and which model was used.
  • Output messages: Completions, tool calls, and function results.
  • Performance metrics: Latency, prompt tokens, completion tokens, total tokens, and computed cost in USD.
  • Retry and key selection trail: Ordered array of every attempt, the key used, and the failure reason for each retry.
  • Custom metadata: Any HTTP header listed in logging_headers, or any header with the x-bf-lh- prefix, captured automatically into log metadata.

For storage, Bifrost supports SQLite by default for development and self-hosted deployments, and PostgreSQL for high-volume production workloads. MySQL and ClickHouse backends are on the roadmap for large-scale time-series analytics. Logs are queryable through a built-in dashboard, a REST API with extensive filters (provider, model, status, latency range, token range, cost range, content search), and a WebSocket endpoint that streams updates in real time.

OpenTelemetry (OTEL) Tracing in Bifrost

Bifrost includes a native OTEL plugin that exports LLM traces in OTLP format to any OpenTelemetry collector. Traces follow the OpenTelemetry GenAI semantic conventions, the standard schema for tracking prompts, model responses, token usage, tool calls, and provider metadata. This makes Bifrost traces interoperable with any OTLP-compatible backend, eliminating the need for vendor-specific instrumentation.

Each LLM span emitted by Bifrost carries comprehensive attributes:

  • Operation type: gen_ai.chat, gen_ai.text, gen_ai.embedding, gen_ai.speech, gen_ai.transcription, or gen_ai.responses.
  • Provider and model: gen_ai.provider.name and gen_ai.request.model.
  • Request parameters: Temperature, max_tokens, top_p, presence_penalty, frequency_penalty, and tool configurations.
  • Token usage: gen_ai.usage.prompt_tokens, gen_ai.usage.completion_tokens, gen_ai.usage.total_tokens.
  • Cost: gen_ai.usage.cost computed in USD.
  • Input and output: Complete chat history with role-based messages, prompt text, tool calls, and tool results.
  • Performance: Start and end timestamps, error details with status codes.

Streaming requests are handled through an accumulator that emits a single complete span when the stream finishes, ensuring token counts and costs are accurate. Spans are tracked with a 20-minute TTL using a sync.Map implementation that prevents memory leaks for long-running processes, and emission happens asynchronously in background goroutines for zero impact on request latency.

Supported OTEL Backends

The OTEL plugin works with any OTLP-compatible backend over HTTP (port 4318) or gRPC (port 4317). Bifrost's OTEL integration ships with first-class configuration recipes for:

  • Grafana Cloud: Native OTLP HTTP endpoint with Basic authentication.
  • Datadog: APM trace endpoint with DD-API-KEY header, paired with native LLM Observability dashboards.
  • New Relic: OTLP HTTP endpoint with api-key header.
  • Honeycomb: OTLP HTTP endpoint with x-honeycomb-team and dataset headers.
  • Langfuse: Open-source LLM observability platform via OTLP HTTP.
  • Self-hosted collectors: Any OpenTelemetry Collector instance, with optional TLS via tls_ca_cert.

The plugin supports the standard OTEL_RESOURCE_ATTRIBUTES environment variable, so attributes such as deployment.environment=production, service.version=1.2.3, and team.name=platform are automatically attached to every emitted span.

Prometheus Metrics and Multi-Node Observability

Beyond traces and logs, Bifrost exposes native Prometheus metrics covering every request dimension. The default metrics include:

  • bifrost_upstream_requests_total, bifrost_success_requests_total, bifrost_error_requests_total
  • bifrost_input_tokens_total, bifrost_output_tokens_total, bifrost_cost_total
  • bifrost_cache_hits_total for semantic cache effectiveness
  • bifrost_upstream_latency_seconds, bifrost_stream_first_token_latency_seconds, bifrost_stream_inter_token_latency_seconds

For single-node deployments, Prometheus can scrape the /metrics endpoint directly. For multi-node deployments behind a load balancer, the OTEL plugin supports push-based metrics export, where every Bifrost node actively pushes metrics to a central OTEL Collector at a configurable interval (default 15 seconds). This eliminates the service discovery and per-node scraper configuration burden of pull-based collection, which becomes unreliable when nodes are scaled dynamically. Bifrost's clustering mode pairs with push metrics for accurate cross-node aggregation in high-availability deployments.

Configuring OTEL Tracing in Bifrost

Enabling OTEL tracing in Gateway mode requires a single plugin entry in config.json:

{
  "plugins": [
    {
      "enabled": true,
      "name": "otel",
      "config": {
        "service_name": "bifrost",
        "collector_url": "<http://localhost:4318>",
        "trace_type": "genai_extension",
        "protocol": "http",
        "headers": {
          "Authorization": "env.OTEL_API_KEY"
        }
      }
    }
  ]
}

Headers support environment variable substitution using the env. prefix, so API keys and tokens are read at runtime rather than committed to configuration files. For gRPC transport, switch protocol to grpc and point collector_url at port 4317. TLS is supported through an optional tls_ca_cert field for collectors that require client certificate authentication.

Push-based metrics export is enabled by adding metrics_enabled, metrics_endpoint, and an optional metrics_push_interval to the same plugin configuration. The same Prometheus-style metrics get pushed via OTLP to a central collector, which can then route to Datadog, Grafana Cloud, or any backend that accepts OTLP metrics.

Best Practices for LLM Observability in Production

A few practices help LLM observability scale cleanly with Bifrost:

  • Use logging headers for tenant attribution: Capture X-Tenant-ID, X-Correlation-ID, or any custom header via the logging_headers configuration to enable per-tenant filtering and cost attribution.
  • Set resource attributes per environment: Tag every trace with deployment.environment, service.version, and team.name so production and staging telemetry never bleed into the same dashboards.
  • Use push metrics in clustered deployments: Pull-based scraping misses nodes behind load balancers; push-based OTLP metrics ensure complete aggregation.
  • Disable content logging for sensitive workloads: The disable_content_logging flag retains usage metadata (tokens, cost, latency) while skipping prompt and completion text, which is critical for regulated industries handling PHI or PII.
  • Correlate gateway traces with application traces: Because Bifrost emits OTLP-compliant spans, they join existing distributed traces automatically when the application propagates the W3C trace context header.

For organizations standardizing on a single observability backend, the OpenTelemetry Collector ecosystem acts as a routing layer between Bifrost and downstream tools, enabling redaction, sampling, and multi-destination export without changes to gateway configuration.

Getting Started with Bifrost LLM Observability

Bifrost makes LLM observability the default rather than an add-on. Built-in logging captures every request without code changes, the OTEL plugin exports OTLP traces to any compatible backend using GenAI semantic conventions, and native Prometheus metrics integrate with existing dashboards. For platform teams running AI in production, this means cost attribution, latency analysis, error debugging, and compliance reporting all flow from a single telemetry source. To see how Bifrost can centralize LLM logging and OTEL tracing for your AI infrastructure, book a demo with the Bifrost team.