Simulation - Maxim Docs

What is simulation?

Simulation runs AI-simulated multi-turn conversations. An AI user generates follow-up messages based on your scenario and persona; your prompt, workflow, or output function responds each turn. Use with_simulation_config() to configure max_turns, persona or a custom simulator, and more. Your data must include a Scenario column (or map it via data_structure).

Simulation with prompts or workflows

Run simulation against a prompt or workflow on the Maxim platform. Use with_prompt_version_id() when testing prompts, or with_workflow_id() when testing workflows:

from maxim import Maxim, Config
from maxim.models import SimulationConfig, Data

maxim = Maxim(Config(api_key="...", base_url="..."))

data_structure = {
    "Scenario": "SCENARIO",
    "Expected Steps": "EXPECTED_STEPS"
}

manual_data: Data = [
    {
        "Scenario": "Question about Pale Blue Dot",
        "Expected Steps": "1. What is the Pale Blue Dot? 2. The 'Pale Blue Dot' refers to an image of Earth taken by the Voyager 1 spacecraft from a distance of about 3.7 billion miles. In 'Cosmos,' Carl Sagan reflects on the image to illustrate the fragility and insignificance of Earth in the vastness of the universe, emphasizing the need for humility and unity among humanity.",
    },
]

# For prompts:
result = (
    maxim.create_test_run("Simulation Test", workspace_id)
    .with_data_structure(data_structure)
    .with_simulation_config(SimulationConfig(max_turns=6))
    .with_prompt_version_id(prompt_version_id)
    .with_data(manual_data)
    .with_evaluators("Bias")
    .run()
)

# For workflows: use .with_workflow_id(workflow_id) instead of .with_prompt_version_id(...)

Simulation with local evaluators

Local evaluators receive LocalEvaluatorResultParameter with output and simulation_outputs. The simulation_outputs list contains the concatenated output from each turn—use it to validate that the simulation produced the expected number of steps:

from typing import Dict
from maxim.evaluators import BaseEvaluator
from maxim.models import (
    LocalEvaluatorResultParameter,
    LocalEvaluatorReturn,
    LocalData,
    PassFailCriteria,
)
from maxim.models.evaluator import PassFailCriteriaOnEachEntry, PassFailCriteriaForTestrunOverall

class SimulationOutputsEvaluator(BaseEvaluator):
    def evaluate(
        self, result: LocalEvaluatorResultParameter, data: LocalData
    ) -> Dict[str, LocalEvaluatorReturn]:
        if not result.simulation_outputs or len(result.simulation_outputs) == 0:
            return {
                "simulation-steps-validator": LocalEvaluatorReturn(
                    score=0, reasoning="No simulation outputs available"
                ),
            }
        expected_lines = [
            line for line in data.get("Expected Steps", "").split("\n")
            if line.strip()
        ]
        expected_steps_count = len(expected_lines)
        actual_steps_count = len(result.simulation_outputs)
        steps_match = actual_steps_count >= expected_steps_count
        return {
            "simulation-steps-validator": LocalEvaluatorReturn(
                score=1 if steps_match else 0,
                reasoning=f"Simulation produced {actual_steps_count} steps, expected {expected_steps_count}",
            ),
        }

def simulation_outputs_evaluator() -> SimulationOutputsEvaluator:
    return SimulationOutputsEvaluator(
        pass_fail_criteria={
            "simulation-steps-validator": PassFailCriteria(
                on_each_entry_pass_if=PassFailCriteriaOnEachEntry(score_should_be=">=", value=1),
                for_testrun_overall_pass_if=PassFailCriteriaForTestrunOverall(
                    overall_should_be=">=", value=100, for_result="percentageOfPassedResults"
                ),
            ),
        }
    )

result = (
    maxim.create_test_run("Simulation Test", workspace_id)
    .with_data_structure(data_structure)
    .with_simulation_config(SimulationConfig(max_turns=6))
    .with_prompt_version_id(prompt_version_id)  # or .with_workflow_id(workflow_id)
    .with_data(manual_data)
    .with_evaluators(simulation_outputs_evaluator())
    .run()
)

Simulation with yields_output

When you combine with_simulation_config() and yields_output(), the SDK runs your output function locally in a turn-by-turn loop. The simulator generates the next user input; your function produces the assistant response for that turn. No prompt version ID or workflow ID is required—this is SDK-only simulation.

from typing import Optional
from maxim.models import LocalData, YieldedOutput, SimulationContext, Data

data_structure_with_input = {
    "Scenario": "SCENARIO",
    "Expected Steps": "EXPECTED_STEPS",
}

manual_data_with_input: Data = [
    {
        "Scenario": "Question about Pale Blue Dot",
        "Expected Steps": "1. What is the Pale Blue Dot? 2. The 'Pale Blue Dot' refers to an image of Earth taken by the Voyager 1 spacecraft from a distance of about 3.7 billion miles. In 'Cosmos,' Carl Sagan reflects on the image to illustrate the fragility and insignificance of Earth in the vastness of the universe, emphasizing the need for humility and unity among humanity.",
    },
]

def yields_fn(data: LocalData, sim_ctx: Optional[SimulationContext] = None) -> YieldedOutput:
    if sim_ctx is not None:
        input_text = sim_ctx.current_user_input.get("input", "")
    else:
        input_text = data.get("Scenario", "")
    return YieldedOutput(data=f"Responding to: {input_text}")

result = (
    maxim.create_test_run("Simulation Test", workspace_id)
    .with_data_structure(data_structure_with_input)
    .with_simulation_config(SimulationConfig(max_turns=6))
    .with_data(manual_data_with_input)
    .with_evaluators("Bias")
    .yields_output(yields_fn)
    .run(1)
)

yields_output with conversation history (for LLM calls)

Your output function receives a second argument sim_ctx: SimulationContext which contains the full conversation history and current user input. Use this to pass the conversation to your LLM for context-aware responses:

from typing import Optional
from maxim.models import LocalData, YieldedOutput, SimulationContext

def yields_fn(data: LocalData, sim_ctx: Optional[SimulationContext] = None) -> YieldedOutput:
    input_text = ""
    messages = []  # Format for chat APIs (e.g., OpenAI messages)

    if sim_ctx is not None:
        input_text = sim_ctx.current_user_input.get("input", "")

        # Build conversation history as alternating User/Assistant turns
        for turn in (sim_ctx.conversation_history or []):
            user_msg = (turn.request or {}).get("input", "")
            asst_msg = (turn.response or {}).get("output", "")
            if user_msg:
                messages.append({"role": "user", "content": user_msg})
            if asst_msg:
                messages.append({"role": "assistant", "content": asst_msg})

        # Add current user input
        messages.append({"role": "user", "content": input_text})

        # Call your LLM with full context
        response = your_llm_client.chat.completions.create(
            model="gpt-5",
            messages=[{"role": "system", "content": "You are a helpful assistant."}] + messages,
        )
        return YieldedOutput(data=response.choices[0].message.content)

    else:
        input_text = data.get("Scenario", "")
        return YieldedOutput(data=f"Responding to: {input_text}")

SimulationContext provides:

conversation_history: List of turns (each has turn, request, response)
current_user_input: User message for the current turn (e.g. {"input": "..."})
turn_number: Current turn index (1-based)
total_cost: Cumulative cost across turns so far
total_tokens: Cumulative token count across turns so far

Each turn in conversation_history has request (user input) and response (assistant output). Use turn.response.get("output", "") for the assistant’s text.

Next Steps

Local Prompt Testing - Test prompts with custom logic
Maxim Prompt Testing - Use prompts on the platform
Local Endpoint Testing - Test agents on your own endpoints
Endpoint on Maxim - Use workflows on the platform
CI/CD Integration - Automate prompt testing

Documentation Index

​What is simulation?

​Simulation with prompts or workflows

​Simulation with local evaluators

​Simulation with yields_output

​yields_output with conversation history (for LLM calls)

​Next Steps