How to Observe Tool Calling and JSON Mode Responses from Fireworks AI in Maxim AI

In the rapidly evolving landscape of generative AI, robust observability is fundamental for building, testing, and deploying reliable applications. As AI models become more sophisticated (capable of tool calling, structured outputs, and complex reasoning) teams require advanced instrumentation to monitor and analyze every interaction. Maxim AI provides an end-to-end observability platform that integrates seamlessly with Fireworks AI, enabling teams to capture, trace, and evaluate tool calls and JSON mode responses with precision.
This blog serves as a comprehensive guide to integrating Maxim AI with Fireworks AI for deep observability, covering technical setup, instrumentation strategies, and best practices. Drawing on Maxim’s official integration documentation and authoritative resources, we detail each step required to achieve transparent monitoring of advanced LLM features.
Table of Contents
- Why Observability Matters in GenAI
- Overview: Maxim AI and Fireworks AI
- Setting Up Your Environment
- Initializing Maxim Logger
- Instrumenting Fireworks AI for Observability
- Monitoring Basic Inference and Streaming Responses
- Advanced Tool Call Monitoring
- Structured JSON Mode Response Monitoring
- Analyzing Observability Data
- Best Practices for AI Observability
- Further Reading and Resources
- Conclusion
Why Observability Matters in GenAI
Observability is the backbone of production-grade AI systems. It enables teams to:
- Track model performance and usage patterns
- Debug complex interactions
- Ensure compliance and reliability
- Optimize cost and resource utilization
Without comprehensive logging and monitoring, teams risk deploying opaque systems that are difficult to debug and improve. As highlighted in Maxim’s guide to AI reliability, transparent observability is essential for building trustworthy and robust AI workflows.
Overview: Maxim AI and Fireworks AI
Fireworks AI is a generative AI platform designed for running, fine-tuning, and customizing large language models (LLMs) with speed and production-readiness. Fireworks AI documentation details advanced features such as tool calling and structured JSON responses.
Maxim AI is an end-to-end platform for simulating, evaluating, and observing AI agents. It enables teams to instrument their AI workflows, capturing every interaction for detailed analysis and continuous improvement. Learn more about Maxim’s capabilities in Maxim’s product documentation.
Setting Up Your Environment
Begin by installing the required packages with specific versions to ensure compatibility:
!pip install fireworks-ai==0.17.9 maxim-py
Using fixed versions protects your build from unexpected compatibility issues between Fireworks AI and Maxim’s instrumentation layer. For more on managing dependencies, see Prompt Management in 2025.
Next, configure your environment variables to securely store API credentials. If working in Google Colab:
from google.colab import userdata
import os
MAXIM_API_KEY = userdata.get("MAXIM_API_KEY")
MAXIM_LOG_REPO_ID = userdata.get("MAXIM_REPO_ID")
FIREWORKS_API_KEY = userdata.get("FIREWORKS_API_KEY")
os.environ["MAXIM_API_KEY"] = MAXIM_API_KEY
os.environ["MAXIM_LOG_REPO_ID"] = MAXIM_LOG_REPO_ID
os.environ["FIREWORKS_API_KEY"] = FIREWORKS_API_KEY
Initializing Maxim Logger
Initialize Maxim’s logger to capture all interactions:
import os
from maxim import Config, Maxim
from maxim.logger import LoggerConfig
maxim = Maxim(Config(api_key=os.getenv("MAXIM_API_KEY")))
logger = maxim.logger(LoggerConfig(id=os.getenv("MAXIM_LOG_REPO_ID")))
The logger organizes captured data by repository ID, supporting structured storage and retrieval. For more on Maxim’s logging architecture, refer to Maxim’s documentation.
Instrumenting Fireworks AI for Observability
Maxim’s integration with Fireworks AI is seamless. Instrument Fireworks with a single function call:
from fireworks import LLM
from maxim.logger.fireworks import instrument_fireworks
instrument_fireworks(logger)
llm = LLM(
model="qwen3-235b-a22b",
deployment_type="serverless"
)
This step ensures all Fireworks API interactions (completions, streaming responses, tool calls) are automatically logged. No additional code changes are required, making observability transparent and robust.
Explore other instrumentation strategies in Maxim’s agent tracing guide.
Monitoring Basic Inference and Streaming Responses
Basic Inference
Test the integration with a simple completion:
response = llm.chat.completions.create(
messages=[{
"role": "user",
"content": "Say this is a test",
}],
)
print(response.choices[0].message.content)
Maxim captures:
- Request metadata (timestamp, model)
- Complete message history
- Response content
- Token usage and latency metrics
- Model reasoning (when available)
Streaming Responses
Streaming introduces unique observability challenges. Maxim reconstructs complete responses from individual chunks and tracks latency metrics:
llm_stream = LLM(
model="qwen3-235b-a22b",
deployment_type="serverless"
)
response_generator = llm_stream.chat.completions.create(
messages=[{
"role": "user",
"content": "Explain the importance of city population data in urban planning",
}],
stream=True,
)
for chunk in response_generator:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
Maxim logs streaming interruptions, time-to-first-token, and overall streaming latency, supporting real-time debugging.
For more on evaluating streaming and inference, see Evaluation Workflows for AI Agents.
Advanced Tool Call Monitoring
Tool calling enables models to invoke external functions for dynamic data retrieval. Maxim’s instrumentation captures all tool call details, including schema, parameters, and success/failure rates.
Example: City Population Assistant
import json
llm_tools = LLM(
model="llama-v3p1-405b-instruct",
deployment_type="serverless"
)
tools = [
{
"type": "function",
"function": {
"name": "get_city_population",
"description": "Retrieve the current population data for a specified city.",
"parameters": {
"type": "object",
"properties": {
"city_name": {
"type": "string",
"description": "The name of the city for which population data is needed, e.g., 'San Francisco'."
},
},
"required": ["city_name"],
},
},
}
]
prompt = f"""
You have access to the following function:
Function Name: '{tools[0]["function"]["name"]}'
Purpose: '{tools[0]["function"]["description"]}'
Parameters Schema: {json.dumps(tools[0]["function"]["parameters"], indent=4)}
Instructions for Using Functions:
1. Use the function '{tools[0]["function"]["name"]}' to retrieve population data when required.
2. If a function call is necessary, reply ONLY in the following format:
{{"city_name": "example_city"}}
3. Adhere strictly to the parameters schema. Ensure all required fields are provided.
4. Use the function only when you cannot directly answer using general knowledge.
5. If no function is necessary, respond to the query directly without mentioning the function.
Examples:
- For a query like "What is the population of Toronto?" respond with:
{{"city_name": "Toronto"}}
- For "What is the population of the Earth?" respond with general knowledge and do NOT use the function.
"""
messages = [
{"role": "system", "content": prompt},
{"role": "user", "content": "What is the population of San Francisco?"}
]
chat_completion = llm_tools.chat.completions.create(
messages=messages,
tools=tools,
temperature=0.1
)
print(chat_completion.choices[0].message.model_dump_json(indent=4))
Maxim observes:
- Tool schema and invocation parameters
- Decision-making process for tool selection
- Parameter validation and schema compliance
- Success/failure rates of tool invocations
For further technical details, refer to Fireworks AI’s function-calling documentation and Maxim’s agent evaluation metrics.
Structured JSON Mode Response Monitoring
JSON mode enforces structured outputs from LLMs, supporting downstream processing and integration. Maxim logs schema compliance, parsing success, and response consistency.
Example: Structured Response with Pydantic
from pydantic import BaseModel, Field
class CityInfo(BaseModel):
winner: str
chat_completion = llm.chat.completions.create(
response_format={
"type": "json_schema",
"json_schema": {
"name": "Result",
"schema": CityInfo.model_json_schema()
}
},
messages=[
{
"role": "user",
"content": "Who won the US presidential election in 2012? Reply just in one JSON.",
},
],
)
print(repr(chat_completion.choices[0].message.content))
Maxim captures:
- Schema compliance validation
- JSON parsing success/failure rates
- Field population accuracy
- Response structure consistency
For more on structured responses, see Fireworks AI’s JSON mode documentation and LLM observability.
Analyzing Observability Data
With Maxim, teams gain access to granular performance metrics:
- Latency Distribution: Analyze response time patterns for optimization
- Token Usage: Track input/output tokens to manage costs
- Success Rates: Monitor API reliability and error patterns
Maxim supports session-level and node-level evaluations, enabling detailed analysis and continuous improvement. For real-world applications, see case studies from Clinc and Atomicwork.
Best Practices for AI Observability
- Log Everything: Comprehensive logging reveals unexpected patterns. Maxim enables both session and node-level evaluations.
- Monitor Continuously: Set up real-time alerts for anomalies using integrations with Slack and PagerDuty.
- Version Control: Track model versions and their performance characteristics.
- Cost Tracking: Monitor token usage to optimize performance and cost.
- Privacy Compliance: Ensure logging practices meet data protection requirements.
For more best practices, refer to Maxim’s guide to reliable AI and AI model monitoring strategies.
Further Reading and Resources
- Maxim Docs
- Observing Tool Calls & JSON Mode Responses from Fireworks AI
- Fireworks AI Structured Response Formatting
- Fireworks AI Function Calling
- Maxim Blog: AI Agent Quality Evaluation
- Maxim Blog: Evaluation Workflows for AI Agents
- Schedule a Maxim Demo
- Maxim vs Langsmith
Conclusion
Integrating Maxim AI with Fireworks AI unlocks comprehensive observability for advanced LLM features. From basic completions to streaming responses, tool calls, and structured JSON outputs, every interaction is captured and analyzed. This foundation empowers teams to build reliable, cost-efficient, and transparent AI applications, accelerating development cycles and ensuring consistent user experiences.
To get started, explore Maxim’s integration guide and schedule a demo to see how Maxim can elevate your AI workflows.