Learn how to programmatically trigger test runs using Maxim’s SDK with custom datasets, flexible output functions, and evaluations for your AI applications.
While Maxim’s web interface provides a powerful way to run tests, the SDK offers even more flexibility and control. With the SDK, you can:
import{ Maxim }from"@maximai/maxim-js";const maxim =newMaxim({ apiKey:""});const result =await maxim.createTestRun("My First SDK Test","your-workspace-id").withDataStructure(/* your data structure here */).withData(/* your data here */).yieldsOutput(/* your output function here */).withWorkflowId(/* or you can pass workflow ID from Maxim platform */).withPromptVersionId(/* or you can pass prompt version ID from Maxim platform */).withEvaluators(/* your evaluators here */).run();
Copy your workspace ID from the workspace switcher in the left topbar
import{ CSVFile, Maxim }from'@maximai/maxim-js';const myCSVFile =newCSVFile('./test.csv',{ question:0,// column index in CSV answer:1, context:2});const maxim =newMaxim({ apiKey:"YOUR_API_KEY"});const result = maxim.createTestRun("CSV Test Run", workspaceId).withDataStructure({ question:"INPUT", answer:"EXPECTED_OUTPUT", context:"CONTEXT_TO_EVALUATE"}).withData(myCSVFile)// ... rest of the configuration
The CSVFile class automatically validates your CSV headers against the data structure and provides type-safe access to your data.
For smaller datasets or programmatically generated data:
import{ Maxim }from"@maximai/maxim-js";const maxim =newMaxim({ apiKey:"YOUR_API_KEY"});const manualData =[{ question:"What is the capital of France?", answer:"Paris", context:"France is a country in Western Europe..."},{ question:"Who wrote Romeo and Juliet?", answer:"William Shakespeare", context:"William Shakespeare was an English playwright..."}];const result = maxim.createTestRun("Manual Data Test", workspaceId).withDataStructure({ question:"INPUT", answer:"EXPECTED_OUTPUT", context:"CONTEXT_TO_EVALUATE"}).withData(manualData)// ... rest of the configuration
import{ Maxim }from"@maximai/maxim-js";const maxim =newMaxim({ apiKey:"YOUR_API_KEY"});const result = maxim.createTestRun("Custom Output Test", workspaceId).withDataStructure({ question:"INPUT", answer:"EXPECTED_OUTPUT", context:"CONTEXT_TO_EVALUATE"}).withData(myData).withWorkflowId(workflowIdFromDashboard, contextToEvaluate)// context to evaluate is optional; it can either be a variable used in the workflow or a column name present in the dataset
Find the workflow ID in the workflows tab and from menu click on copy ID.
Platform Integration: Testing with Prompt Versions
import{ Maxim }from"@maximai/maxim-js";const maxim =newMaxim({ apiKey:"YOUR_API_KEY"});const result = maxim.createTestRun("Custom Output Test", workspaceId).withDataStructure({ question:"INPUT", answer:"EXPECTED_OUTPUT", context:"CONTEXT_TO_EVALUATE"}).withData(myData).withPromptVersionId(promptVersionIdFromPlatform, contextToEvaluate)// context to evaluate is optional; it can either be a variable used in the prompt or a column name present in the dataset
To get prompt version ID, go to prompts tab, select the version you want to run tests on and from menu click on copy version id.
import{ Maxim }from"@maximai/maxim-js";const maxim =newMaxim({ apiKey:"YOUR_API_KEY"});const result = maxim.createTestRun("Custom Output Test", workspaceId).withDataStructure({ question:"INPUT", answer:"EXPECTED_OUTPUT", context:"CONTEXT_TO_EVALUATE"}).withData(myData).yieldsOutput(async(data)=>{// Call your model or APIconst response =awaityourModel.call( data.question, data.context);return{// Required: The actual output data: response.text,// Optional: Context used for evaluation// Returning a value here will utilize this context for// evaluation instead of the CONTEXT_TO_EVALUATE column (if provided) retrievedContextToEvaluate: response.relevantContext,// Optional: Performance metrics meta:{ usage:{ promptTokens: response.usage.prompt_tokens, completionTokens: response.usage.completion_tokens, totalTokens: response.usage.total_tokens, latency: response.latency}, cost:{ input: response.cost.input, output: response.cost.output, total: response.cost.input + response.cost.output}}};})
If your output function throws an error, the entry will be marked as failed and you’ll receive the index in the failed_entry_indices array after the run completes.
import{ Maxim }from"@maximai/maxim-js";const maxim =newMaxim({ apiKey:"YOUR_API_KEY"});const result =await maxim.createTestRun("Long Test", workspaceId)// ... previous configuration.withConcurrency(5);// Process 5 entries at a time