Offline Evaluation Concepts

Key Concepts of Offline Evaluation

Before using offline evaluation features, it is important to be familiar with the essential components that make up the evaluation process. A thorough understanding of these foundational concepts is critical to conducting effective offline evaluation of AI models.

Prompts

Prompts are text-based inputs provided to AI models to guide their responses and influence the behaviour of the model. The structure and complexity of prompts can vary based on the specific AI model and the intended use case. Prompts may range from simple questions to detailed instructions or multi-turn conversations. They can be optimised or fine-tuned using a range of configuration options, such as variables and other model parameters like temperature, max tokens, etc, to achieve the desired output. Here’s an example of a multi-turn prompt structure:

Turn	Content	Purpose
Initial prompt	You are a helpful AI assistant specialized in geography.	Sets the context for the interaction (optional, model-dependent)
User input	What’s the capital of France?	The first query for the AI to respond to
Model response	The capital of France is Paris.	The model’s response to the first query
User input	What’s its population?	A follow-up question, building on the previous context
Model response	As of 2023, the estimated population of Paris is about 2.2 million people in the city proper.	The model’s response to the follow-up question

You can find more about prompts here.

Prompt Comparisons

Prompt comparisons help evaluate different prompts side-by-side to determine which ones produce the best results for a given task. They allow for easy comparison of prompt structures, outputs, and performance metrics across multiple models or configurations. You can find more about prompt comparisons here.

Agents via No-Code Builder

Agents are structured sequences of AI interactions designed to tackle complex tasks through a series of interconnected steps. Agents provide a visual representation of the workflow, and allow for code-based and API configuration. You can find more about agents via no-code builder here.

Workflows

Workflows enable end-to-end testing of AI applications via HTTP endpoints. They allow seamless integration of existing AI services without code changes, featuring payload configuration with dynamic variables, playground testing, and output mapping for evaluation. You can find more about workflows here.

Test Runs

Test runs are controlled executions of prompts, no-code agents or http endpoints to evaluate their performance, accuracy, and behavior under various conditions. They can be single or comparison runs providing detailed summaries, performance metrics, and debug information for every entry to assess AI model performance. Tests can be run on prompts, no-code agents, http endpoints or datasets directly.

Evaluators

Evaluators are tools or metrics used to assess the quality, accuracy, and effectiveness of AI model outputs. We have various types of evaluators that can be customized and integrated into workflows and test runs. See below for more details. You can find more about Maxim’s pre-built evaluators or create your own

Evaluator type	Description
AI	Uses AI models to assess outputs
Programmatic	Applies predefined rules or algorithms
Statistical	Utilizes statistical methods for evaluation
Human	Involves human judgment and feedback
API-based	Leverages external APIs for assessment
You can find more about our evaluators here.

Datasets

Datasets are collections of data used for training, testing, and evaluating AI models within workflows and evaluations. They allow users to test their prompts and AI systems against their own data, and include features for data structure management, integration with AI workflows, and privacy controls. You can find more about datasets here.

Context Sources

Context sources handle and organize contextual information that AI models use to understand and respond to queries more effectively. They support Retrieval-Augmented Generation (RAG) and include API integration, sample input testing, and seamless incorporation into AI workflows. Context sources enable developers to enhance their AI models’ performance by providing relevant background information for more accurate and contextually appropriate responses. You can find more about context sources here.

Prompt Tools

Prompt tools are utilities that assist in creating, refining, and managing prompts, enhancing the efficiency of working with AI prompts. They feature custom function creation, a playground environment, and integration with workflows. You can find more about prompt tools here.

Schedule a demo to see how Maxim AI helps teams ship reliable agents.

Introduction

Prompt Engineering

Offline Evals

Online Evals

Tracing

Simulations

Library

Dashboards

Integrations

Settings

Offline Evaluation Concepts

Key Concepts of Offline Evaluation

Prompts

Prompt Comparisons

Agents via No-Code Builder

Workflows

Test Runs

Evaluators

Datasets

Context Sources

Prompt Tools

Introduction

Prompt Engineering

Offline Evals

Online Evals

Tracing

Simulations

Library

Dashboards

Integrations

Settings

​Key Concepts of Offline Evaluation

​Prompts

​Prompt Comparisons

​Agents via No-Code Builder

​Workflows

​Test Runs

​Evaluators

​Datasets

​Context Sources

​Prompt Tools

Key Concepts of Offline Evaluation

Prompts

Prompt Comparisons

Agents via No-Code Builder

Workflows

Test Runs

Evaluators

Datasets

Context Sources

Prompt Tools