Guides

How to Observe Tool Calling and JSON Mode Responses from Fireworks AI in Maxim AI

How to Observe Tool Calling

In the rapidly evolving landscape of generative AI, robust observability is fundamental for building, testing, and deploying reliable applications. As AI models become more sophisticated (capable of tool calling, structured outputs, and complex reasoning) teams require advanced instrumentation to monitor and analyze every interaction. Maxim AI provides an end-to-end observability platform that integrates seamlessly with Fireworks AI, enabling teams to capture, trace, and evaluate tool calls and JSON mode responses with precision.

This blog serves as a comprehensive guide to integrating Maxim AI with Fireworks AI for deep observability, covering technical setup, instrumentation strategies, and best practices. Drawing on Maxim’s official integration documentation and authoritative resources, we detail each step required to achieve transparent monitoring of advanced LLM features.

Why Observability Matters in GenAI
Overview: Maxim AI and Fireworks AI
Setting Up Your Environment
Initializing Maxim Logger
Instrumenting Fireworks AI for Observability
Monitoring Basic Inference and Streaming Responses
Advanced Tool Call Monitoring
Structured JSON Mode Response Monitoring
Analyzing Observability Data
Best Practices for AI Observability
Further Reading and Resources
Conclusion

Why Observability Matters in GenAI

Observability is the backbone of production-grade AI systems. It enables teams to:

Track model performance and usage patterns
Debug complex interactions
Ensure compliance and reliability
Optimize cost and resource utilization

Without comprehensive logging and monitoring, teams risk deploying opaque systems that are difficult to debug and improve. As highlighted in Maxim’s guide to AI reliability, transparent observability is essential for building trustworthy and robust AI workflows.

Overview: Maxim AI and Fireworks AI

Fireworks AI is a generative AI platform designed for running, fine-tuning, and customizing large language models (LLMs) with speed and production-readiness. Fireworks AI documentation details advanced features such as tool calling and structured JSON responses.

Maxim AI is an end-to-end platform for simulating, evaluating, and observing AI agents. It enables teams to instrument their AI workflows, capturing every interaction for detailed analysis and continuous improvement. Learn more about Maxim’s capabilities in Maxim’s product documentation.

Setting Up Your Environment

Begin by installing the required packages with specific versions to ensure compatibility:

!pip install fireworks-ai==0.17.9 maxim-py

Using fixed versions protects your build from unexpected compatibility issues between Fireworks AI and Maxim’s instrumentation layer. For more on managing dependencies, see Prompt Management in 2025.

Next, configure your environment variables to securely store API credentials. If working in Google Colab:

from google.colab import userdata
import os

MAXIM_API_KEY = userdata.get("MAXIM_API_KEY")
MAXIM_LOG_REPO_ID = userdata.get("MAXIM_REPO_ID")
FIREWORKS_API_KEY = userdata.get("FIREWORKS_API_KEY")

os.environ["MAXIM_API_KEY"] = MAXIM_API_KEY
os.environ["MAXIM_LOG_REPO_ID"] = MAXIM_LOG_REPO_ID
os.environ["FIREWORKS_API_KEY"] = FIREWORKS_API_KEY

Initializing Maxim Logger

Initialize Maxim’s logger to capture all interactions:

import os
from maxim import Config, Maxim
from maxim.logger import LoggerConfig

maxim = Maxim(Config(api_key=os.getenv("MAXIM_API_KEY")))
logger = maxim.logger(LoggerConfig(id=os.getenv("MAXIM_LOG_REPO_ID")))

The logger organizes captured data by repository ID, supporting structured storage and retrieval. For more on Maxim’s logging architecture, refer to Maxim’s documentation.

Instrumenting Fireworks AI for Observability

Maxim’s integration with Fireworks AI is seamless. Instrument Fireworks with a single function call:

from fireworks import LLM
from maxim.logger.fireworks import instrument_fireworks

instrument_fireworks(logger)

llm = LLM(
    model="qwen3-235b-a22b",
    deployment_type="serverless"
)

This step ensures all Fireworks API interactions (completions, streaming responses, tool calls) are automatically logged. No additional code changes are required, making observability transparent and robust.

Explore other instrumentation strategies in Maxim’s agent tracing guide.

Monitoring Basic Inference and Streaming Responses

Basic Inference

Test the integration with a simple completion:

response = llm.chat.completions.create(
    messages=[{
        "role": "user",
        "content": "Say this is a test",
    }],
)
print(response.choices[0].message.content)

Maxim captures:

Request metadata (timestamp, model)
Complete message history
Response content
Token usage and latency metrics
Model reasoning (when available)

Streaming Responses

Streaming introduces unique observability challenges. Maxim reconstructs complete responses from individual chunks and tracks latency metrics:

llm_stream = LLM(
    model="qwen3-235b-a22b",
    deployment_type="serverless"
)

response_generator = llm_stream.chat.completions.create(
    messages=[{
        "role": "user",
        "content": "Explain the importance of city population data in urban planning",
    }],
    stream=True,
)

for chunk in response_generator:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Maxim logs streaming interruptions, time-to-first-token, and overall streaming latency, supporting real-time debugging.

For more on evaluating streaming and inference, see Evaluation Workflows for AI Agents.

Advanced Tool Call Monitoring

Tool calling enables models to invoke external functions for dynamic data retrieval. Maxim’s instrumentation captures all tool call details, including schema, parameters, and success/failure rates.

Example: City Population Assistant

import json

llm_tools = LLM(
    model="llama-v3p1-405b-instruct",
    deployment_type="serverless"
)

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_city_population",
            "description": "Retrieve the current population data for a specified city.",
            "parameters": {
                "type": "object",
                "properties": {
                    "city_name": {
                        "type": "string",
                        "description": "The name of the city for which population data is needed, e.g., 'San Francisco'."
                    },
                },
                "required": ["city_name"],
            },
        },
    }
]

prompt = f"""
You have access to the following function:

Function Name: '{tools[0]["function"]["name"]}'
Purpose: '{tools[0]["function"]["description"]}'
Parameters Schema: {json.dumps(tools[0]["function"]["parameters"], indent=4)}

Instructions for Using Functions:
1. Use the function '{tools[0]["function"]["name"]}' to retrieve population data when required.
2. If a function call is necessary, reply ONLY in the following format:
   {{"city_name": "example_city"}}
3. Adhere strictly to the parameters schema. Ensure all required fields are provided.
4. Use the function only when you cannot directly answer using general knowledge.
5. If no function is necessary, respond to the query directly without mentioning the function.

Examples:
- For a query like "What is the population of Toronto?" respond with:
  {{"city_name": "Toronto"}}
- For "What is the population of the Earth?" respond with general knowledge and do NOT use the function.
"""

messages = [
    {"role": "system", "content": prompt},
    {"role": "user", "content": "What is the population of San Francisco?"}
]

chat_completion = llm_tools.chat.completions.create(
    messages=messages,
    tools=tools,
    temperature=0.1
)

print(chat_completion.choices[0].message.model_dump_json(indent=4))

Maxim observes:

Tool schema and invocation parameters
Decision-making process for tool selection
Parameter validation and schema compliance
Success/failure rates of tool invocations

For further technical details, refer to Fireworks AI’s function-calling documentation and Maxim’s agent evaluation metrics.

Structured JSON Mode Response Monitoring

JSON mode enforces structured outputs from LLMs, supporting downstream processing and integration. Maxim logs schema compliance, parsing success, and response consistency.

Example: Structured Response with Pydantic

from pydantic import BaseModel, Field

class CityInfo(BaseModel):
    winner: str

chat_completion = llm.chat.completions.create(
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "Result",
            "schema": CityInfo.model_json_schema()
        }
    },
    messages=[
        {
            "role": "user",
            "content": "Who won the US presidential election in 2012? Reply just in one JSON.",
        },
    ],
)

print(repr(chat_completion.choices[0].message.content))

Maxim captures:

Schema compliance validation
JSON parsing success/failure rates
Field population accuracy
Response structure consistency

For more on structured responses, see Fireworks AI’s JSON mode documentation and LLM observability.

Analyzing Observability Data

With Maxim, teams gain access to granular performance metrics:

Latency Distribution: Analyze response time patterns for optimization
Token Usage: Track input/output tokens to manage costs
Success Rates: Monitor API reliability and error patterns

Maxim supports session-level and node-level evaluations, enabling detailed analysis and continuous improvement. For real-world applications, see case studies from Clinc and Atomicwork.

Best Practices for AI Observability

Log Everything: Comprehensive logging reveals unexpected patterns. Maxim enables both session and node-level evaluations.
Monitor Continuously: Set up real-time alerts for anomalies using integrations with Slack and PagerDuty.
Version Control: Track model versions and their performance characteristics.
Cost Tracking: Monitor token usage to optimize performance and cost.
Privacy Compliance: Ensure logging practices meet data protection requirements.

For more best practices, refer to Maxim’s guide to reliable AI and AI model monitoring strategies.

Conclusion

Integrating Maxim AI with Fireworks AI unlocks comprehensive observability for advanced LLM features. From basic completions to streaming responses, tool calls, and structured JSON outputs, every interaction is captured and analyzed. This foundation empowers teams to build reliable, cost-efficient, and transparent AI applications, accelerating development cycles and ensuring consistent user experiences.

To get started, explore Maxim’s integration guide and schedule a demo to see how Maxim can elevate your AI workflows.

How to Observe Tool Calling and JSON Mode Responses from Fireworks AI in Maxim AI

Table of Contents