Agent

👀 Observing Tool Calls 🔨 and JSON Mode Responses from Fireworks AI

Modern AI applications require robust monitoring and observability to track model performance, understand usage patterns, and debug complex interactions. When working with advanced features like tool calls and structured JSON responses, having comprehensive logging becomes even more critical. In this guide, we'll explore how to integrate Maxim's observability platform with Fireworks AI to monitor and analyze these AI interactions.

Fireworks AI is a generative AI platform designed for running, fine-tuning, and customizing large language models (LLMs) with speed and production-readiness. Maxim AI is an end-to-end platform for simulating, evaluating, and observing AI agents. It helps teams build, test, and deploy high-quality AI applications faster and more reliably by applying software best practices to AI workflows. We will use these two platform SDKs to learn to observe tool calls & JSON mode responses.

Resources

Cookbook showing Fireworks Integration from Maxim, containing the code pieces used in this blog - Github Link
Signup on Maxim to get Maxim API Key & Log Repo ID - Sign Up
Signup on Fireworks to get Fireworks API Key - Sign Up

Step 1: Setting Up Dependencies

First, let's install the required packages with specific versions to ensure compatibility:

!pip install fireworks-ai==0.17.9 maxim-py

Why this matters: Using specific versions ensures reproducible builds and prevents compatibility issues between the Fireworks AI client and Maxim's instrumentation layer.

Step 2: Environment Configuration

We are using Google Colab to build this project, that's why the following structure is used to configure environment variables.

from google.colab import userdata
import os

# Retrieve API keys from secure storage
MAXIM_API_KEY = userdata.get("MAXIM_API_KEY")
MAXIM_LOG_REPO_ID = userdata.get("MAXIM_REPO_ID")
FIREWORKS_API_KEY = userdata.get("FIREWORKS_API_KEY")

# Set environment variables for the SDKs
os.environ["MAXIM_API_KEY"] = MAXIM_API_KEY
os.environ["MAXIM_LOG_REPO_ID"] = MAXIM_LOG_REPO_ID
os.environ["FIREWORKS_API_KEY"] = FIREWORKS_API_KEY

Step 3: Initialize Maxim Logger

import os
from maxim import Config, Maxim
from maxim.logger import LoggerConfig

# Initialize Maxim with configuration
maxim = Maxim(Config(api_key=os.getenv("MAXIM_API_KEY")))
logger = maxim.logger(LoggerConfig(id=os.getenv("MAXIM_LOG_REPO_ID")))

What's happening here:

We create a Maxim instance with our API credentials
The logger is configured with a specific repository ID for organized log storage
This logger will capture all AI interactions once we instrument Fireworks

Step 4: Instrument Fireworks AI with Maxim

from fireworks import LLM
from maxim.logger.fireworks import instrument_fireworks

# Enable automatic logging for all Fireworks interactions
instrument_fireworks(logger)

# Initialize the LLM with a powerful model
llm = LLM(
    model="qwen3-235b-a22b",
    deployment_type="serverless"
)

This is the magic step:

instrument_fireworks() automatically wraps all Fireworks API calls
Every completion, streaming response, and tool call will be logged
No additional code changes needed - observability becomes transparent

Step 5: Testing Basic Inference with Logging

Let's start with a simple example to see the logging in action:

response = llm.chat.completions.create(
    messages=[{
        "role": "user",
        "content": "Say this is a test",
    }],
)

print(response.choices[0].message.content)

What Maxim captures:

Request timestamp and model used
Complete message history and parameters
Response content and metadata
Token usage and latency metrics
Model reasoning (if available)

Step 6: Monitoring Streaming Responses

Streaming responses present unique observability challenges. Here's how Maxim handles them:

# Create a new LLM instance for streaming
llm_stream = LLM(
    model="qwen3-235b-a22b",
    deployment_type="serverless"
)

response_generator = llm_stream.chat.completions.create(
    messages=[{
        "role": "user",
        "content": "Explain the importance of city population data in urban planning",
    }],
    stream=True,
)

# Process streaming chunks
for chunk in response_generator:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Observability benefits:

Maxim reconstructs the complete response from streaming chunks
Time-to-first-token and streaming latency are tracked
Any interruptions or errors in streaming are captured

Step 7: Advanced Tool Call Monitoring

Now for the exciting part - monitoring tool calls. Let's create our city population assistant:

import json

# Initialize LLM for tool calling
llm_tools = LLM(
    model="llama-v3p1-405b-instruct",
    deployment_type="serverless"
)

# Define our tool schema
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_city_population",
            "description": "Retrieve the current population data for a specified city.",
            "parameters": {
                "type": "object",
                "properties": {
                    "city_name": {
                        "type": "string",
                        "description": "The name of the city for which population data is needed, e.g., 'San Francisco'."
                    },
                },
                "required": ["city_name"],
            },
        },
    }
]

# Create a comprehensive system prompt
prompt = f"""
You have access to the following function:

Function Name: '{tools[0]["function"]["name"]}'
Purpose: '{tools[0]["function"]["description"]}'
Parameters Schema: {json.dumps(tools[0]["function"]["parameters"], indent=4)}

Instructions for Using Functions:
1. Use the function '{tools[0]["function"]["name"]}' to retrieve population data when required.
2. If a function call is necessary, reply ONLY in the following format:
   <function={tools[0]["function"]["name"]}>{"city_name": "example_city"}</function>
3. Adhere strictly to the parameters schema. Ensure all required fields are provided.
4. Use the function only when you cannot directly answer using general knowledge.
5. If no function is necessary, respond to the query directly without mentioning the function.

Examples:
- For a query like "What is the population of Toronto?" respond with:
  <function=get_city_population>{"city_name": "Toronto"}</function>
- For "What is the population of the Earth?" respond with general knowledge and do NOT use the function.
"""

# Execute the tool call
messages = [
    {"role": "system", "content": prompt},
    {"role": "user", "content": "What is the population of San Francisco?"}
]

chat_completion = llm_tools.chat.completions.create(
    messages=messages,
    tools=tools,
    temperature=0.1
)

print(chat_completion.choices[0].message.model_dump_json(indent=4))

What Maxim observes in tool calls:

The complete tool schema and definitions
Which tools are invoked and with what parameters
The model's decision-making process for tool selection
Success/failure rates of tool invocations
Parameter validation and schema compliance

Step 8: JSON Mode Response Monitoring

Finally, let's implement structured JSON responses with full observability:

from pydantic import BaseModel, Field

# Define our response schema
class CityInfo(BaseModel):
    winner: str

# Make a structured request
chat_completion = llm.chat.completions.create(
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "Result",
            "schema": CityInfo.model_json_schema()
        }
    },
    messages=[
        {
            "role": "user",
            "content": "Who won the US presidential election in 2012? Reply just in one JSON.",
        },
    ],
)

print(repr(chat_completion.choices[0].message.content))

JSON Mode observability insights:

Schema compliance validation
JSON parsing success/failure rates
Response structure consistency
Field population accuracy

0:00

/0:20

Check your Logs from Fireworks on Maxim showing Tool Calls & JSON Mode Responses

Understanding Your Observability Data

With Maxim integration, you gain access to:

Performance Metrics

Latency Distribution: Understand response time patterns
Token Usage: Track input/output tokens for cost optimization
Success Rates: Monitor API reliability and error patterns

Best Practices for AI Observability

Log Everything: Don't be selective - comprehensive logging reveals unexpected patterns. Maxim allows you to enable session level & node level evaluations too on your logs.
Monitor Continuously: Set up alerts for anomalies in performance or behavior. Maxim provides Real Time Alerting using Slack & Pagerduty.
Version Control: Track model versions and their performance characteristics
Cost Tracking: Monitor token usage to optimize for both performance and cost
Privacy Compliance: Ensure logging practices meet data protection requirements

Conclusion

Integrating Maxim with Fireworks AI provides unprecedented visibility into your AI application's behavior. From simple completions to complex tool calls and structured JSON responses, every interaction is captured and analyzed. This observability foundation enables you to:

Build more reliable AI applications
Optimize performance and costs
Debug complex issues quickly
Ensure consistent user experiences

The combination of Fireworks AI's powerful models and Maxim's comprehensive observability creates a production-ready foundation for sophisticated AI applications. Start monitoring your AI interactions today and unlock the insights hidden in your model's behavior.