👀 Observing Tool Calls 🔨 and JSON Mode Responses from Fireworks AI

Modern AI applications require robust monitoring and observability to track model performance, understand usage patterns, and debug complex interactions. When working with advanced features like tool calls and structured JSON responses, having comprehensive logging becomes even more critical. In this guide, we'll explore how to integrate Maxim's observability platform with Fireworks AI to monitor and analyze these AI interactions.
Fireworks AI is a generative AI platform designed for running, fine-tuning, and customizing large language models (LLMs) with speed and production-readiness. Maxim AI is an end-to-end platform for simulating, evaluating, and observing AI agents. It helps teams build, test, and deploy high-quality AI applications faster and more reliably by applying software best practices to AI workflows. We will use these two platform SDKs to learn to observe tool calls & JSON mode responses.
Resources
- Cookbook showing Fireworks Integration from Maxim, containing the code pieces used in this blog - Github Link
- Signup on Maxim to get Maxim API Key & Log Repo ID - Sign Up
- Signup on Fireworks to get Fireworks API Key - Sign Up
Step 1: Setting Up Dependencies
First, let's install the required packages with specific versions to ensure compatibility:
!pip install fireworks-ai==0.17.9 maxim-py
Why this matters: Using specific versions ensures reproducible builds and prevents compatibility issues between the Fireworks AI client and Maxim's instrumentation layer.
Step 2: Environment Configuration
We are using Google Colab to build this project, that's why the following structure is used to configure environment variables.
from google.colab import userdata
import os
# Retrieve API keys from secure storage
MAXIM_API_KEY = userdata.get("MAXIM_API_KEY")
MAXIM_LOG_REPO_ID = userdata.get("MAXIM_REPO_ID")
FIREWORKS_API_KEY = userdata.get("FIREWORKS_API_KEY")
# Set environment variables for the SDKs
os.environ["MAXIM_API_KEY"] = MAXIM_API_KEY
os.environ["MAXIM_LOG_REPO_ID"] = MAXIM_LOG_REPO_ID
os.environ["FIREWORKS_API_KEY"] = FIREWORKS_API_KEY
Step 3: Initialize Maxim Logger
import os
from maxim import Config, Maxim
from maxim.logger import LoggerConfig
# Initialize Maxim with configuration
maxim = Maxim(Config(api_key=os.getenv("MAXIM_API_KEY")))
logger = maxim.logger(LoggerConfig(id=os.getenv("MAXIM_LOG_REPO_ID")))
What's happening here:
- We create a Maxim instance with our API credentials
- The logger is configured with a specific repository ID for organized log storage
- This logger will capture all AI interactions once we instrument Fireworks
Step 4: Instrument Fireworks AI with Maxim
from fireworks import LLM
from maxim.logger.fireworks import instrument_fireworks
# Enable automatic logging for all Fireworks interactions
instrument_fireworks(logger)
# Initialize the LLM with a powerful model
llm = LLM(
model="qwen3-235b-a22b",
deployment_type="serverless"
)
This is the magic step:
instrument_fireworks()
automatically wraps all Fireworks API calls- Every completion, streaming response, and tool call will be logged
- No additional code changes needed - observability becomes transparent
Step 5: Testing Basic Inference with Logging
Let's start with a simple example to see the logging in action:
response = llm.chat.completions.create(
messages=[{
"role": "user",
"content": "Say this is a test",
}],
)
print(response.choices[0].message.content)
What Maxim captures:
- Request timestamp and model used
- Complete message history and parameters
- Response content and metadata
- Token usage and latency metrics
- Model reasoning (if available)
Step 6: Monitoring Streaming Responses
Streaming responses present unique observability challenges. Here's how Maxim handles them:
# Create a new LLM instance for streaming
llm_stream = LLM(
model="qwen3-235b-a22b",
deployment_type="serverless"
)
response_generator = llm_stream.chat.completions.create(
messages=[{
"role": "user",
"content": "Explain the importance of city population data in urban planning",
}],
stream=True,
)
# Process streaming chunks
for chunk in response_generator:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
Observability benefits:
- Maxim reconstructs the complete response from streaming chunks
- Time-to-first-token and streaming latency are tracked
- Any interruptions or errors in streaming are captured
Step 7: Advanced Tool Call Monitoring
Now for the exciting part - monitoring tool calls. Let's create our city population assistant:
import json
# Initialize LLM for tool calling
llm_tools = LLM(
model="llama-v3p1-405b-instruct",
deployment_type="serverless"
)
# Define our tool schema
tools = [
{
"type": "function",
"function": {
"name": "get_city_population",
"description": "Retrieve the current population data for a specified city.",
"parameters": {
"type": "object",
"properties": {
"city_name": {
"type": "string",
"description": "The name of the city for which population data is needed, e.g., 'San Francisco'."
},
},
"required": ["city_name"],
},
},
}
]
# Create a comprehensive system prompt
prompt = f"""
You have access to the following function:
Function Name: '{tools[0]["function"]["name"]}'
Purpose: '{tools[0]["function"]["description"]}'
Parameters Schema: {json.dumps(tools[0]["function"]["parameters"], indent=4)}
Instructions for Using Functions:
1. Use the function '{tools[0]["function"]["name"]}' to retrieve population data when required.
2. If a function call is necessary, reply ONLY in the following format:
<function={tools[0]["function"]["name"]}>{"city_name": "example_city"}</function>
3. Adhere strictly to the parameters schema. Ensure all required fields are provided.
4. Use the function only when you cannot directly answer using general knowledge.
5. If no function is necessary, respond to the query directly without mentioning the function.
Examples:
- For a query like "What is the population of Toronto?" respond with:
<function=get_city_population>{"city_name": "Toronto"}</function>
- For "What is the population of the Earth?" respond with general knowledge and do NOT use the function.
"""
# Execute the tool call
messages = [
{"role": "system", "content": prompt},
{"role": "user", "content": "What is the population of San Francisco?"}
]
chat_completion = llm_tools.chat.completions.create(
messages=messages,
tools=tools,
temperature=0.1
)
print(chat_completion.choices[0].message.model_dump_json(indent=4))
What Maxim observes in tool calls:
- The complete tool schema and definitions
- Which tools are invoked and with what parameters
- The model's decision-making process for tool selection
- Success/failure rates of tool invocations
- Parameter validation and schema compliance
Step 8: JSON Mode Response Monitoring
Finally, let's implement structured JSON responses with full observability:
from pydantic import BaseModel, Field
# Define our response schema
class CityInfo(BaseModel):
winner: str
# Make a structured request
chat_completion = llm.chat.completions.create(
response_format={
"type": "json_schema",
"json_schema": {
"name": "Result",
"schema": CityInfo.model_json_schema()
}
},
messages=[
{
"role": "user",
"content": "Who won the US presidential election in 2012? Reply just in one JSON.",
},
],
)
print(repr(chat_completion.choices[0].message.content))
JSON Mode observability insights:
- Schema compliance validation
- JSON parsing success/failure rates
- Response structure consistency
- Field population accuracy
Check your Logs from Fireworks on Maxim showing Tool Calls & JSON Mode Responses
Understanding Your Observability Data
With Maxim integration, you gain access to:
Performance Metrics
- Latency Distribution: Understand response time patterns
- Token Usage: Track input/output tokens for cost optimization
- Success Rates: Monitor API reliability and error patterns
Best Practices for AI Observability
- Log Everything: Don't be selective - comprehensive logging reveals unexpected patterns. Maxim allows you to enable session level & node level evaluations too on your logs.
- Monitor Continuously: Set up alerts for anomalies in performance or behavior. Maxim provides Real Time Alerting using Slack & Pagerduty.
- Version Control: Track model versions and their performance characteristics
- Cost Tracking: Monitor token usage to optimize for both performance and cost
- Privacy Compliance: Ensure logging practices meet data protection requirements
Conclusion
Integrating Maxim with Fireworks AI provides unprecedented visibility into your AI application's behavior. From simple completions to complex tool calls and structured JSON responses, every interaction is captured and analyzed. This observability foundation enables you to:
- Build more reliable AI applications
- Optimize performance and costs
- Debug complex issues quickly
- Ensure consistent user experiences
The combination of Fireworks AI's powerful models and Maxim's comprehensive observability creates a production-ready foundation for sophisticated AI applications. Start monitoring your AI interactions today and unlock the insights hidden in your model's behavior.