Skip to main content

Automatic Token and Cost Tracking

When you log LLM generations using the Maxim SDK, token usage and costs are captured automatically from the model response. When recording an LLM response, include the usage object with prompt_tokens, completion_tokens, and total_tokens:
generation.result({
    id: "chatcmpl-123",
    object: "chat.completion",
    created: Date.now(),
    model: "gpt-4o",
    choices: [{
        index: 0,
        message: {
            role: "assistant",
            content: "Response content here"
        },
        finish_reason: "stop"
    }],
    usage: {
        prompt_tokens: 100,
        completion_tokens: 50,
        total_tokens: 150
    }
});
For framework integrations like LangChain, LlamaIndex, or Google ADK, token usage is tracked automatically when you use the Maxim tracer. For example, with LangChain:
from maxim.logger.langchain import MaximLangchainTracer

langchain_tracer = MaximLangchainTracer(logger)
response = llm.invoke(messages, config={"callbacks": [langchain_tracer]})
This automatically captures latency, token usage, and costs for all LLM calls without additional instrumentation.

Custom Metric Tracking via SDK

For more granular control, you can log token usage and cost metrics explicitly at different levels using the addMetric method. Track metrics at the trace level:
trace.addMetric('cost_usd', 0.05);
trace.addMetric('tokens_total', 1420);
Track metrics at the generation level:
generation.addMetric('tokens_in', 312);
generation.addMetric('tokens_out', 87);
generation.addMetric('ttft_ms', 180.5);  // Time to first token
generation.addMetric('tps', 15.8);        // Tokens per second
Track metrics at the session level for aggregates across multiple interactions:
session.addMetric('traces_count', 4);
session.addMetric('user_messages_count', 2);

Configuring Custom Token Pricing

To ensure cost calculations reflect your actual expenses (such as negotiated enterprise rates), configure custom pricing structures:
1
Navigate to Settings > Models > Pricing
2
Enter a model name pattern (string or regex) that matches your model names
3
Input your token usage cost per 1,000 tokens for both input and output tokens
Custom pricing supports OpenAI, Microsoft Azure, Groq, HuggingFace, Together AI, Google Cloud, and Amazon Bedrock models

Applying Pricing to Model Configs

1
Go to Settings > Models > Model Configs
2
Select a model config to edit
3
Locate the Pricing structure section
4
Choose your pricing structure from the dropdown

Applying Pricing to Log Repositories

1
Open Logs from the sidebar
2
Select the log repository you want to configure
3
Find the Pricing structure section
4
Choose your pricing structure from the dropdown
When no custom rates exist, standard pricing applies by default.

Setting Up Cost and Token Alerts

Monitor token usage and costs in real-time by configuring alerts:
1
Navigate to your log repository and select the Alerts tab
2
Click Create alert and select Log metrics as the alert type
3
Configure thresholds for:
  • Token Usage: Alert when consumption exceeds limits (e.g., trigger when hourly usage exceeds 1 million tokens)
  • Cost: Alert when expenses exceed budgets (e.g., trigger when daily costs exceed $100)
4
Select notification channels (Slack or PagerDuty)
5
Click Create alert

Dashboard Visibility

Once logging is set up, you can view aggregated token and cost data in your log repository dashboard, including:
  • Total usage over time
  • Cost per trace
  • Token counts for each log entry (visible in the logs table)
  • Latency and performance metrics
Learn more in the documentation for custom metrics, custom pricing, generations logging, and setting up alerts.