> ## Documentation Index
> Fetch the complete documentation index at: https://www.getmaxim.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Running Your First Eval

> Get started with your first evaluation run in Maxim by setting up model providers, creating prompts or agent endpoints, and preparing your dataset. This page guides you step-by-step through launching and testing your first eval.

export const MaximPlayer = ({url}) => {
  return <iframe className="border-background-highlight-secondary h-full w-full rounded-md border-2 aspect-video" src={url} allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowFullScreen></iframe>;
};

## 1. Set Up Your Environment

First, configure your AI model providers:

<Steps>
  <Step title="Go to `Settings` → `Models`.">Click on the tab of the provider for which you want to add an API key.</Step>
  <Step title="Configure model provider">Click on `Add New` and fill in the required details.</Step>
</Steps>

<Note>Maxim requires at least one provider with access to GPT-3.5 and GPT-4 models. We use industry-standard encryption to securely store your API keys.</Note>

To learn more about API keys, inviting users, and managing roles, refer to our [Workspace and roles](/settings/members-and-roles) guide.

<MaximPlayer url="https://drive.google.com/file/d/1WzUCBIDewojn6r3Om0HUEmb_oI8OyM3W/preview" />

## 2. Create Your First Prompt or HTTP Endpoint

Create prompts to experiment and evaluate a call to a model with attached context or tools. Use endpoints to easily test your complex AI agents using the HTTP endpoint for your application without any integration.

### Prompt

<Steps>
  <Step title="Create prompt">Navigate to the `Prompts` tab under the `Evaluate` section and click on Single prompts. Click `Create prompt` or `Try sample` to get started.</Step>
  <Step title="Write your first prompt">Write your system prompt and user prompt in the respective fields.</Step>
  <Step title="Configure model and parameters.">Configure additional settings like model, temperature, and max tokens.</Step>
  <Step title="Iterate">Click `Run` to test your prompt and see the AI's response. Iterate on your prompt based on the results.</Step>
  <Step title="Save prompt and publish a version.">When satisfied, click `Save` to create a new version of your prompt.</Step>
</Steps>

To learn more about prompts, refer to our [detailed guide on prompts](/prompt-engineering/prompt-playground).

### HTTP Endpoint

<Steps>
  <Step title="Create endpoint">Navigate to the `HTTP Endpoints` option under the tab `Agents` located in the `Evaluate` section. Click `Create Endpoint` or `Try sample`.</Step>
  <Step title="Configure agent endpoint">Enter your API endpoint URL in the `URL` field. Configure any necessary headers or parameters. You can use dynamic variables like `{input}` to reference static context easily in any part of your endpoint using `{}`</Step>
  <Step title="Test your agent">Click `Run` to test your endpoint in the playground.</Step>
  <Step title="Configure endpoint for testing">In the `Output Mapping` section, select the part of the response you want to evaluate (e.g., `data.response`). Click `Save` to create your endpoint.</Step>
</Steps>

To learn more about agent endpoints, refer to our detailed guide on [Agent Endpoints](/offline-evals/via-ui/agents-via-http-endpoint/quickstart).

## 3. Prepare Your Dataset

Organize and manage the data you'll use for testing and evaluation:

<Steps>
  <Step title="Create dataset">Navigate to the Datasets tab under the `Library` section. Click `Create New` or `Upload CSV` or `Generate synthetic Data` to get started.</Step>
  <Step title="Edit dataset">If creating a new dataset, enter a name and description for your dataset. Add columns to your dataset (e.g., 'input' and 'expected\_output').</Step>
  <Step title="Save">Add entries to your dataset, filling in the values for each column. Click `Save` to create your dataset.</Step>
</Steps>

<Tip>
  New to evaluation? Use [Synthetic Data Generation](/library/datasets/synthetic-data-generation) to quickly create test datasets tailored to your use case without manual data entry.
</Tip>

To learn more about datasets, refer to our detailed guide on [Datasets](/library/datasets/import-or-create-datasets).

## 5. Add Evaluators

Set up evaluators to assess your prompt or endpoint's performance:

<Steps>
  <Step title="Add evaluators from store">Navigate to the `Evaluators` tab under the `Library` section. Click `Add Evaluator` to browse available evaluators.</Step>
  <Step title="Configure added evaluators">Choose an evaluator type (e.g., AI, Programmatic, API, or Human). Configure the evaluator settings as needed. Click `Save` to add the evaluator to your workspace.</Step>
</Steps>

To learn more about evaluators, refer to our detailed guide on [Evaluators](/library/evaluators/pre-built-evaluators/overview).

## 6. Run Your First Test

Execute a test run to evaluate your prompt or endpoint:

<Steps>
  <Step title="Select endpoint/prompt to test">Navigate to your saved prompt or endpoint. Click `Test` in the top right corner.</Step>
  <Step title="Configure test run">Select the dataset you created earlier. Choose the evaluators you want to use for this test run.</Step>
  <Step title="Trigger">Click `Trigger Test Run` to start the evaluation process.</Step>
</Steps>

<Note>If you've added human evaluators, you'll be prompted to set up human annotation on the report or via email.</Note>

## 7. Analyze Test Results

Review and analyze the results of your test run:

<Steps>
  <Step title="View report">Navigate to the `Runs` tab in the left navigation menu. Find your recent test run and click on it to view details.</Step>
  <Step title="Review performance">Review the overall performance metrics and scores for each evaluator. Drill down into individual queries to see specific scores and reasoning.</Step>
  <Step title="Iterate">Use these insights to identify areas for improvement in your prompt or endpoints.</Step>
</Steps>

## Next Steps

Now that you've completed your first cycle on the Maxim platform, consider exploring these additional capabilities:

1. [Prompt comparisons](/prompt-engineering/prompt-playground): Evaluate different prompts side-by-side to determine which ones produce the best results for a given task.
2. [Agents via no-code builder](/offline-evals/via-ui/agents-via-no-code-builder/quickstart): Create complex, multi-step AI workflows. Learn how to connect prompts, code, and APIs to build powerful, real-world AI systems using our intuitive, no-code editor.
3. [Context sources](/library/context-sources): Integrate Retrieval-Augmented Generation (RAG) into your agent endpoints.
4. [Prompt tools](/library/prompt-tools): Enhance your prompts with custom functions and agentic behaviors.
5. [Observability](/tracing/overview): Use our stateless SDK to monitor real-time production logs and run periodic quality checks.

By following this guide, you've learned how to set up your environment, create prompts, prepare datasets, set up endpoints, add evaluators, run tests, and analyze results. This foundational knowledge will help you leverage Maxim's powerful features to develop and improve your AI applications efficiently.

<Note>[Schedule a demo](https://getmaxim.ai/demo) to see how Maxim AI helps teams ship reliable agents.</Note>
