🐞 Building an Agentic Debugging Game: Anthropic for LLM & Maxim for Observability

🐞 Building an Agentic Debugging Game: Anthropic for LLM & Maxim for Observability

Welcome! In this tutorial, we'll build a fun, interactive AI agent called "Guess the Bug." The agent will use Anthropic's Claude model to generate simple Python code snippets with hidden bugs. Your job is to find the bug, and the agent will tell you if you're right!

Most importantly, we'll build this with observability in mind from day one, using Maxim to automatically log every interaction. This will give us a transparent, end-to-end view of our agent's behavior, making it easy to debug, monitor, and improve.

We'll use:

  • Anthropic (Claude): For generating buggy code and evaluating user guesses.
  • Maxim: For automatic logging and observability of all AI interactions.
  • Streamlit: For creating a simple, clean web UI.

Let's get started!

Prerequisites

Before you begin, make sure you have:

  • Python 3.8+ installed.
  • An Anthropic API Key.
  • A Maxim API Key and a Log Repo ID from your Maxim dashboard.

Resources

  1. Working Code of this Agent is uploaded here - Code
  2. Signup on Maxim to get your API Key - SignUp

Step 1: Setting Up Your Project

First, let's set up our project directory and virtual environment.

Set your API keys as environment variables. This is the most secure way to handle your keys.

export ANTHROPIC_API_KEY="your_anthropic_api_key"
export MAXIM_API_KEY="your_maxim_api_key"
export MAXIM_LOG_REPO_ID="your_maxim_log_repo_id"

Install the packages:

pip install -r requirements.txt

Create a requirements.txt file with the following dependencies:

streamlit
anthropic
maxim-py

Create and activate a virtual environment:

python3 -m venv venv
source venv/bin/activate
# On Windows, use: venv\\Scripts\\activate

Create a project folder and navigate into it:

mkdir guess-the-bug-agent
cd guess-the-bug-agent

Step 2: Building the Agent Logic (agent.py)

Now for the fun part. We'll create an agent.py file to house the core logic of our application. We will follow a clean, object-oriented structure that separates concerns: Tools, a Chat class, and an Agent class.

Create a file named agent.py and add the following code:

import os
import streamlit as st
from uuid import uuid4
from maxim import Maxim
from maxim.logger.anthropic import MaximAnthropicClient
from anthropic import Anthropic

# --- Tool Functions ---# These are simple, stateless functions that perform a single job.

def generate_buggy_snippet(chat, logger, trace_id):
    """Asks the chat model to generate a buggy Python snippet."""
    prompt = (
        "Generate a simple Python code snippet that contains a common bug. "
        "Only output the code, no explanation."
    )
    code = chat.send(prompt, trace_id, max_tokens=150)
    return code

def evaluate_user_guess(chat, code, user_guess, logger, trace_id):
    """Asks the chat model to evaluate the user's guess."""
    prompt = (
        f"Here is a Python code snippet:\\n{code}\\n"
        f"The user thinks the bug is: {user_guess}\\n"
        "Is the user correct? If not, explain the actual bug."
    )
    feedback = chat.send(prompt, trace_id, max_tokens=200)
    return feedback

# --- Chat Class ---# This class handles all communication with the Anthropic API.# Notice how we wrap the Anthropic client with MaximAnthropicClient for auto-logging.

class Chat:
    def __init__(self, anthropic_api_key, logger):
        self.anthropic = Anthropic(api_key=anthropic_api_key)
        self.client = MaximAnthropicClient(client=self.anthropic, logger=logger)
        self.model = "claude-3-5-sonnet-20241022"

    def send(self, prompt, trace_id, max_tokens=200):
        response = self.client.messages.create(
            model=self.model,
            max_tokens=max_tokens,
            messages=[{"role": "user", "content": prompt}],
            extra_headers={"x-maxim-trace-id": trace_id}
        )
        return response.content[0].text.strip()

# --- Agent Class ---# The agent manages the state and orchestrates the game logic.

class GuessTheBugAgent:
    def __init__(self, chat, logger):
        self.chat = chat
        self.logger = logger
        self.session_id = str(uuid4())
        self.trace_id = None
        self.current_snippet = None

    def new_trace(self):
        """Creates a new trace for a new game round, linked to the session."""
        self.trace_id = str(uuid4())
        self.logger.trace({
            "id": self.trace_id,
            "name": "Guess the Bug Round",
            "session_id": self.session_id
        })

    def new_game(self):
        """Starts a new game by generating a new snippet."""
        self.new_trace()
        self.current_snippet = generate_buggy_snippet(self.chat, self.logger, self.trace_id)
        return self.current_snippet

    def submit_guess(self, user_guess):
        """Submits the user's guess for evaluation."""
        if not self.current_snippet:
            return "No active game. Start a new one!"
        feedback = evaluate_user_guess(
            self.chat, self.current_snippet, user_guess, self.logger, self.trace_id
        )
        return feedback

# --- Singleton Pattern ---# This ensures we use the same agent instance across Streamlit reruns.

def get_agent():
    if "agent" not in st.session_state:
        _anthropic_api_key = os.environ.get("ANTHROPIC_API_KEY")
        _logger = Maxim().logger()
        _chat = Chat(_anthropic_api_key, _logger)
        st.session_state.agent = GuessTheBugAgent(_chat, _logger)
    return st.session_state.agent

Step 3: Creating the Frontend (app.py)

With our agent logic in place, let's build the user interface with Streamlit. This will be a simple app that shows the code, takes the user's guess, and displays the AI's feedback.

Create a file named app.py:

import streamlit as st
from agent import get_agent

# --- Page Configuration ---
st.set_page_config(page_title="Guess the Bug!", page_icon="🐞")
st.title("🐞 Guess the Bug - AI Edition")

# --- Get the Agent ---# get_agent uses Streamlit's session state to keep a single agent instance.

agent = get_agent()

# --- Game Logic ---# Initialize a new game if one isn't already running.

if "snippet" not in st.session_state:
    with st.spinner("AI is generating a buggy snippet..."):
        st.session_state.snippet = agent.new_game()
    st.session_state.feedback = ""

# --- UI Display ---
st.subheader("Find the bug in this Python code:")
st.code(st.session_state.snippet, language="python")

# --- User Input ---
user_guess = st.text_input("What do you think the bug is?")

col1, col2 = st.columns([1, 1])

with col1:
    if st.button("Check My Guess", use_container_width=True):
        if user_guess:
            with st.spinner("AI is evaluating your guess..."):
                feedback = agent.submit_guess(user_guess)
                st.session_state.feedback = feedback
        else:
            st.warning("Please enter your guess first!")

with col2:
    if st.button("Next Bug 🐞", use_container_width=True):
        with st.spinner("AI is generating a new buggy snippet..."):
            st.session_state.snippet = agent.new_game()
        st.session_state.feedback = ""
        st.rerun()

# --- Display Feedback ---

if st.session_state.feedback:
    st.markdown("---")
    st.subheader("AI Feedback:")
    st.write(st.session_state.feedback)

Step 4: Run the App and See Traces on Maxim

You're all set! It's time to run the application.

  1. Execute the following command in your terminal:
streamlit run app.py

Your browser will open with the "Guess the Bug" game running.

0:00
/0:20

2.Play the game!

  • A buggy code snippet will appear.
  • Type your guess into the input box.
  • Click "Check My Guess" to get feedback from the AI.
  • Click "Next Bug" to try a new challenge.
  1. Check Your Maxim Dashboard: As you play, head over to your Maxim dashboard. You will see new traces appearing in your Log Repo. Each trace corresponds to one round of the game (from generating the bug to evaluating your guess).
0:00
/0:32

Click on any trace to see the full observability data:

  • The exact prompts sent to Anthropic's Claude model.
  • The full, raw responses received.
  • Latency, token usage, and other critical metadata.

This is the power of the MaximAnthropicClient wrapper, it captures everything automatically, giving you a perfect audit trail for debugging and monitoring your AI agent without any extra code.

Conclusion

Congratulations! You've successfully built a fun AI agent and integrated powerful, zero-effort observability with Maxim. This project is a perfect starting point for more complex AI applications where understanding and monitoring the AI's behavior is critical.

From here, you could extend the agent by:

  • Adding a scoring system.
  • Supporting more programming languages.
  • Building a full, multi-turn conversational interface for asking for hints.

Happy coding!