Skip to main content

What is Voice Simulation?

Voice Simulation automates the testing of telephony bots and voice assistants. Instead of manually calling your agent repeatedly, Maxim initiates calls using a configured “Simulation Agent” (the user). This agent speaks, listens, and responds based on a defined persona and scenario, allowing you to test:
  • Speech Patterns: How the agent handles different accents or speaking styles.
  • Latency: The delay between the user finishing a sentence and the agent responding.
  • Turn-Taking: Whether the agent interrupts appropriately or waits for the user to finish.

How do I Set Up a Voice Simulation?

Setting up a voice test involves three main steps: connecting your telephony provider, defining the agent, and configuring the scenario.
1

Connect a Voice Provider

First, you must bridge Maxim with a telephony service to enable calling capabilities.
  • Navigate to Settings -> Voice Providers.
  • Add a provider (e.g., Twilio, VAPI).
  • Enter the required credentials, such as the Account SID and Auth Key, to establish a secure connection.
2

Configure the Voice Agent

Next, tell Maxim where to call.
  • Go to Agents -> Voice Agents.
  • Create a new agent profile with a name and description (this context helps the simulation agent understand who it is talking to).
  • Add the Phone Number and Country Code of your voice agent.
3

Define Persona and Scenario

Finally, configure the test parameters.
  • Scenario: Define the conversation context and goals (e.g., “User wants to reschedule a dental appointment”).
  • Persona: Set the voice characteristics and style of the caller.
  • Initiator Settings: Configure who speaks first. By default, the simulation agent waits for your voice agent to greet. If you set the simulation agent to speak first, ensure your voice agent is configured to wait before responding.

What Metrics can I Evaluate for Voice Agents?

Maxim provides multiple pre-built statistical metrics tailored for audio interactions, providing insights that text transcripts alone cannot offer:
MetricDescription
Average Response LatencyThe time taken by your agent to respond after the user stops speaking. Crucial for measuring “lag.”
Speech RateMeasures the speed of speech, ensuring the agent isn’t speaking too fast or too slow for the user to understand.
Talk RatioThe balance of speaking time between the user and the agent (e.g., is the agent lecturing the user?).
Average PitchAnalyzes the tonal quality of the conversation.
During the run, you can view real-time transcripts of the dialogue and playback or download the audio recordings for detailed analysis.