Skip to main content
Datasets are collections of data used for training, testing, and evaluating AI models within workflows and evaluations. Test your prompts, http agents or no-code agents across test cases in this dataset and view results at scale. Begin with a template and customize column structure. Evolve your datasets over time from production logs or human annotation.
Looking to quickly generate test data? Try our Synthetic Data Generation feature to create datasets for prompt testing or agent simulation without manual data entry.

Create Datasets Using Templates

Create Datasets quickly with predefined structures using our templates: Dataset templates showing three template options for different testing scenarios

Prompt or Workflow Testing

Choose this template for single-turn interactions based on individual inputs to test prompts or workflows. Example: Input column with prompts like “Summarize this article about climate change” paired with an Expected Output column containing ideal responses.

Agent Simulation

Select this template for multi-turn simulations to test agent behaviors across conversation sequences. Example: Scenario column with “Customer inquiring about return policy” and Expected Steps column outlining the agent’s expected actions.

Dataset Testing

Use this template when evaluating against existing output data to compare expected and actual results. Example: Input column with “What’s the weather in New York?” and Expected Output column with “65°F and sunny” for direct evaluation.

Create Datasets Using CSV

You can also import or create datasets in Maxim using CSV files.
1

Create or Import a Dataset Using CSV

  1. Go to the Datasets section in the Library.
  2. Click on Upload CSV Dataset.
  3. Upload your CSV file.
  4. Map your columns with their respective types. Create dataset with csv
  5. Click on Create Dataset to complete dataset creation.
CSV-based dataset creation is useful when you already have structured data prepared in spreadsheets or logs. Ensure columns are mapped correctly to avoid mismatches.

Update Existing Datasets Using CSV

You can add new entries to an existing dataset by uploading a CSV with compatible columns.
1

Update an Existing Dataset with CSV

  1. Prepare your CSV file with columns matching your dataset structure (e.g., Input, Expected_Output).
  2. In the Maxim UI, go to Library → Datasets, select your dataset.
  3. Click on Upload CSV and upload your CSV file.
  4. Map CSV columns to dataset columns or create new dataset columns. Add Dataset Entries with csv
  5. Confirm mapped dataset columns and column types and click on Add to dataset.
When updating a dataset, the CSV must follow the same column structure defined in the dataset to ensure consistency.

Add Images to Your Dataset

You can enhance your datasets by including images alongside other data types. This is particularly useful for:
  • Visual content evaluation
  • Image-based prompts and responses
  • Multi-modal testing scenarios
1

Add Images to Your Dataset

You can add images to your Dataset by creating a column of type Images. We support both URL and local file paths.Create dataset with images
When working with images in datasets:
  • Supported formats include common image types (PNG, JPG, JPEG, GIF)
  • For URLs, ensure they are publicly accessible
  • For local files, maintain consistent file paths across your team

Column types

Input

The Input column type represents the input query used to test your application. This column contains the data that will be processed by your prompt, endpoint, or agent during test runs. Every dataset must have at least one Input column to define what data will be fed into your application. Use this column to store the questions, prompts, or data points that you want to test. Input columns are automatically used during test runs and passed to your application for processing. Examples:
  • “Summarize this article about climate change”
  • “What’s the weather in New York?”
  • “Translate the following text to French: Hello world”
The Input column serves as the primary entry point for your test cases and enables systematic evaluation across multiple scenarios.

Expected Output

The Expected Output column type represents the desired response that your application should generate for the corresponding input. This column is used to evaluate whether your application’s actual output matches the expected result during test runs. Use this column to define what you consider the correct or ideal response for each input. When evaluators are configured for your test runs, they compare the actual output against the expected output to determine accuracy and quality. Examples:
  • For an input “Translate ‘Hello’ to Spanish”, the expected output might be “Hola”
  • For a summarization task, the expected output would be the ideal summary of the original content
This column is essential for automated evaluation and helps identify cases where your application’s outputs deviate from the desired results.

Output

The Output column type is used when you have already run your queries elsewhere and have the outputs within your CSV that you want to evaluate directly. This column contains pre-generated responses that you want to assess or analyze without re-running your application. Use this column when you need to evaluate existing outputs from previous runs, external systems, or manual annotations. This is particularly useful when you want to assess the quality of responses that were generated outside of Maxim’s platform. This column type enables you to import historical evaluation data and perform comparative analysis on outputs that were generated using different models or prompts.

File

The File column type allows you to upload a file or provide a file URL containing assets or structured data. This is useful for testing applications that need to process documents, images, or other file-based inputs. Use this column when your test cases require external files such as PDFs, images, documents, or structured data files. Files can be uploaded directly to Maxim or referenced via URLs. Supported formats: PDF, TXT, WAV, MP3, images, and other common file types This column type enables multimodal testing scenarios where your application needs to process different types of media and structured content beyond text inputs.

Variables

The Variables column type stores values used to parameterize prompts or endpoint payloads, allowing dynamic substitution of content during testing. This enables you to create reusable test cases with flexible inputs that can be customized per row. Use this column to define contextual information that varies across test cases but can be referenced consistently in your prompts or API requests. Variables are automatically substituted during test runs using the {{variable_name}} syntax. Example: A variable like {{user_name}} can be defined in your dataset and automatically substituted in prompts or API requests during test runs. Variables provide flexibility in dataset design by allowing you to parameterize common elements like user information, preferences, or context-specific data without hardcoding values into your prompts.

Scenario

The Scenario column type allows you to define specific situations or contexts for your test cases. Use this column to describe the background, user intent, or environment in which an interaction takes place. Scenarios help guide agents or models to respond appropriately based on the described situation. Examples:
  • “A customer wants to buy an iPhone.”
  • “A user is trying to cancel their subscription.”
  • “A student asks for help with a math problem.”
Scenarios are especially useful for simulating real-world conversations, testing agent behaviors, and ensuring your models handle a variety of user intents and contexts. Use this column when you are using this dataset for agent simulation runs.

Expected Steps

The Expected Steps column type allows you to specify the sequence of actions or decisions that an agent should take in response to a given scenario. This helps users clearly outline the ideal process or workflow, making it easier for evaluators to verify whether the agent is behaving as intended. Use this column to break down the expected agent behavior into individual, logical steps. This is especially useful for multi-turn interactions or complex tasks where the agent’s reasoning and actions need to be evaluated step by step. Example:
- Greet the customer and ask how you can help.
- Look up the customer's order history.
- Provide information about the return policy.
- Offer to initiate a return if eligible.
Including expected steps in your dataset enables more granular evaluation and helps ensure that agents follow the correct procedures during simulations or tests.

Expected Tool Calls

The Expected Tool Calls column type allows you to specify which tools (such as APIs, functions, or plugins) you expect an agent to use in response to a scenario. This is especially useful when running prompt runs, where you want to evaluate whether the agent is choosing and invoking the correct tools as part of its reasoning process. Use this column to list the names of the tools or actions that should be called, optionally including parameters or expected arguments. This helps ensure that the agent’s tool usage aligns with your expectations for the task. Examples:
  • “search_web”
  • “get_weather(location=‘San Francisco’)”
  • “send_email(recipient, subject, body)”
Including expected tool calls in your dataset enables more precise evaluation of agent behavior, particularly in scenarios where tool usage is critical to task completion.

inAnyOrder

[
    {
        "inAnyOrder": [
            {
                "name": "list_commits",
                "arguments": {
                    "owner": "facebook",
                    "repo": "react"
                }
            },
            {
                "name": "list_branches",
                "arguments": {
                    "owner": "facebook",
                    "repo": "react"
                }
            },
            {
                "name": "list_tags",
                "arguments": {
                    "owner": "facebook",
                    "repo": "react"
                }
            }
        ]
    }
]
This combinator indicates that all listed tool calls are mandatory, but they may be executed in any order; any ordering is considered valid.

anyOne

[
     {
        "anyOne": [
            {
                "name": "get_pull_request_reviews",
                "arguments": {
                    "owner": "facebook",
                    "repo": "react",
                    "pullNumber": 25678
                }
            },
            {
                "name": "get_pull_request_comments",
                "arguments": {
                    "owner": "facebook",
                    "repo": "react",
                    "pullNumber": 25678
                }
            }
        ]
    }
]
The anyOne combinator is used when any one of several possible tool calls is acceptable to fulfill the requirement. This is useful in scenarios where there are multiple valid ways for an agent to achieve the same outcome, and you want to allow for flexibility in the agent’s approach. For example, in the following JSON, either get_pull_request_reviews or get_pull_request_comments (with the specified arguments) will be considered a valid response. The agent only needs to make one of these tool calls to satisfy the expectation.

Conversation History

Conversation history allows you to include a chat history while running Prompt tests. The sequence of messages sent to the LLM is as follows:
  • messages in the prompt version
  • history
  • input column in the dataset.

Format

  • Conversation history is always a JSON array
[
    {
        "role": "user",
        "content" "This is string content"
    },
    {
        "role": "user",
        "content" : [
            {
                "type": "text",
                "text": "This is with image attachment"
            },
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://url-image.com",
                    "detail": "low"
                }
            }
        ]
    }
]
Similarly you can add assistant and tool messages in conversation history.