Run a Prompt with tool calls
Ensuring your prompt selects the accurate tool call (function) is crucial for building reliable and efficient AI workflows. Maxim’s playground allows you to attach your tools (API, code or schema) and measure tool call accuracy for agentic systems.
Tool call usage is a core part of any agentic AI workflow. Maxim’s playground allows you to effectively test if the right tools and are being chosen by the LLM and if they are getting successfully executed.
In Maxim, you can create prompt tools within the library
section of your workspace. These could be executable or just the schema and then attached to your prompt for testing.
Attach and run your tools in playground
Create a new tool
Create a new tool in the library. Use an API or code for executable tools and schema if you only want to test tool choice.
Attach tools to your prompt
Select and attach tools to the prompt within the configuration section.
Send prompt with tool instructions
Send your prompt referencing the tool usage instructions.
Review assistant's tool selection
Check the assistant response with tool choice and arguments.
Examine tool execution results
For executable tools, check the tool response message that is shown post execution.
Manually test different scenarios
Edit tool type messages manually to test for different responses.
By experimenting in the playground, you can now make sure your prompt is calling the right tools in specific scenarios and that the execution of the tool leads to the right responses.
To test tool call accuracy at scale across all your use cases, run experiments using a dataset and evaluators as shown below.
Measure tool call accuracy across your test cases
Prepare your dataset
Set up your dataset with input
and expected tool calls
columns.
Define expected tool calls
For each input, add the JSON of one or more expected tool calls and arguments you expect from the assistant.
Initiate prompt testing
Trigger a test on the prompt which has the tools attached.
Select your test dataset
Select your dataset from the dropdown.
Choose the accuracy evaluator
Select the tool call accuracy evaluator under statistical evaluators and trigger the run. Add from evaluator store if not available in your workspace.
Review accuracy scores
Once the test run is completed, the tool call accuracy scores will be 0 or 1 based on assistant output.
Analyze detailed message logs
To check details of the messages click on any entry and click on the messages
tab.