ElevenLabs STT-TTS

ElevenLabs enables you to build real-time voice AI applications with industry-leading speech synthesis and transcription. Maxim’s integration provides comprehensive observability for your ElevenLabs voice agents, offering real-time monitoring of conversation flows, function calls, and performance metrics. This integration allows you to:

Monitor voice agent conversations in real-time
Trace STT and TTS operations with automatic span capture
Link multiple operations under a single trace for end-to-end visibility
Debug and optimize your voice AI applications

Requirements

elevenlabs>=1.0.0
maxim-py>=3.9.0
python-dotenv>=1.1.0

Environment Variables

Set up the following environment variables in your .env file:

EL_API_KEY=your_elevenlabs_api_key
MAXIM_API_KEY=your_maxim_api_key
MAXIM_LOG_REPO_ID=your_maxim_log_repo_id

Initialize Logger and Instrument ElevenLabs

from elevenlabs.client import ElevenLabs
from maxim import Maxim
from maxim.logger.elevenlabs import instrument_elevenlabs

# Initialize Maxim logger
# Automatically reads MAXIM_API_KEY and MAXIM_LOG_REPO_ID from environment
logger = Maxim().logger()

# Instrument ElevenLabs - patches SDK methods for automatic tracing
instrument_elevenlabs(logger)

# Initialize ElevenLabs client AFTER instrumentation
client = ElevenLabs(api_key=elevenlabs_api_key)

Always call instrument_elevenlabs(logger) before creating the ElevenLabs client to ensure all operations are traced.

Text-to-Speech (TTS)

Convert text to natural-sounding speech with automatic tracing.

import os
from dotenv import load_dotenv
from elevenlabs.client import ElevenLabs
from elevenlabs.play import play

from maxim import Maxim
from maxim.logger.elevenlabs import instrument_elevenlabs

load_dotenv()

logger = Maxim().logger()
instrument_elevenlabs(logger)

elevenlabs_api_key = os.getenv("EL_API_KEY")
client = ElevenLabs(api_key=elevenlabs_api_key)

audio = client.text_to_speech.convert(
    text="The first move is what sets everything in motion.",
    voice_id="JBFqnCBsd6RMkjVDRZzb",
    model_id="eleven_multilingual_v2",
    output_format="mp3_44100_128",
)

play(audio)

The instrumentation automatically captures:

Input text
Voice ID and model used
Output audio metadata
Latency metrics

Speech-to-Text (STT)

Transcribe audio to text with automatic tracing.

import os
from dotenv import load_dotenv
from elevenlabs.client import ElevenLabs

from maxim import Maxim
from maxim.logger.elevenlabs import instrument_elevenlabs

load_dotenv()

logger = Maxim().logger()
instrument_elevenlabs(logger)

elevenlabs_api_key = os.getenv("EL_API_KEY")
client = ElevenLabs(api_key=elevenlabs_api_key)

with open("audio_file.wav", "rb") as audio_file:
    transcript = client.speech_to_text.convert(
        file=audio_file,
        model_id="scribe_v1"
    )

print(transcript.text)

The instrumentation automatically captures:

Input audio attachment
Model used for transcription
Output transcript text
Processing time

Linking Operations with Trace ID

Link multiple STT, TTS, or LLM operations under a single trace using the x-maxim-trace-id header.

from uuid import uuid4
from elevenlabs.core import RequestOptions
from maxim.logger.components.trace import TraceConfigDict

# Create a shared trace ID
trace_id = str(uuid4())

# Create a trace
trace = logger.trace(
    TraceConfigDict(
        id=trace_id,
        name="Voice Pipeline",
        tags={"provider": "elevenlabs", "operation": "pipeline"},
    )
)

# Create request options with trace ID header
request_options = RequestOptions(
    additional_headers={
        "x-maxim-trace-id": trace_id
    }
)

# STT operation - linked to trace
with open("input.wav", "rb") as audio_file:
    transcript = client.speech_to_text.convert(
        file=audio_file,
        model_id="scribe_v1",
        request_options=request_options
    )

# TTS operation - linked to same trace
audio = client.text_to_speech.convert(
    text="Response to the transcript",
    voice_id="JBFqnCBsd6RMkjVDRZzb",
    model_id="eleven_multilingual_v2",
    request_options=request_options
)

trace.end()

Combining with LLM Calls

Build a complete voice pipeline by combining ElevenLabs STT/TTS with an LLM for processing.

from uuid import uuid4
from openai import OpenAI
from elevenlabs.client import ElevenLabs
from elevenlabs.core import RequestOptions

from maxim import Maxim
from maxim.logger.elevenlabs import instrument_elevenlabs
from maxim.logger.openai import MaximOpenAIClient
from maxim.logger.components.trace import TraceConfigDict

# Initialize logger and instrument
logger = Maxim().logger()
instrument_elevenlabs(logger)

# Initialize clients
elevenlabs_client = ElevenLabs(api_key=elevenlabs_api_key)
openai_client = MaximOpenAIClient(
    client=OpenAI(api_key=openai_api_key),
    logger=logger
)

# Create unified trace
trace_id = str(uuid4())
trace = logger.trace(
    TraceConfigDict(
        id=trace_id,
        name="STT-LLM-TTS Pipeline",
    )
)

request_options = RequestOptions(
    additional_headers={"x-maxim-trace-id": trace_id}
)

# 1. Speech-to-Text
with open("user_audio.wav", "rb") as audio_file:
    transcript = elevenlabs_client.speech_to_text.convert(
        file=audio_file,
        model_id="scribe_v1",
        request_options=request_options
    )

# 2. LLM Processing
response = openai_client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": transcript.text},
    ],
    extra_headers={"x-maxim-trace-id": trace_id}
)
llm_response = response.choices[0].message.content

# 3. Text-to-Speech
audio = elevenlabs_client.text_to_speech.convert(
    text=llm_response,
    voice_id="JBFqnCBsd6RMkjVDRZzb",
    model_id="eleven_multilingual_v2",
    request_options=request_options
)

trace.end()
logger.cleanup()

What Gets Traced

Operation	Captured Data
Text-to-Speech	Input text, voice ID, model ID, output format, audio metadata, latency
Speech-to-Text	Input audio attachment, model ID, output transcript, processing time
Linked Operations	All operations under same trace ID with parent-child relationships

Debug Mode

Enable debug mode for detailed logging during development:

logger = Maxim({"debug": True}).logger()

Cleanup

Always call logger.cleanup() before your application exits to ensure all traces are flushed:

if __name__ == "__main__":
    try:
        # Your application code
        run_voice_pipeline()
    finally:
        logger.cleanup()

What gets logged to Maxim

Text-to-Speech: Input text, voice ID, model ID, output format, audio metadata, latency
Speech-to-Text: Input audio attachment, model ID, output transcript, processing time
Linked Operations: All operations under same trace ID with parent-child relationships

Overview

Python

Typescript

Requirements

Environment Variables

Initialize Logger and Instrument ElevenLabs

Text-to-Speech (TTS)

Speech-to-Text (STT)

Linking Operations with Trace ID

Combining with LLM Calls

What Gets Traced

Debug Mode

Cleanup

What gets logged to Maxim

Resources

Full Pipeline Cookbook

ElevenLabs Docs

Overview

Python

Typescript

​Requirements

​Environment Variables

​Initialize Logger and Instrument ElevenLabs

​Text-to-Speech (TTS)

​Speech-to-Text (STT)

​Linking Operations with Trace ID

​Combining with LLM Calls

​What Gets Traced

​Debug Mode

​Cleanup

​What gets logged to Maxim

​Resources

Full Pipeline Cookbook

ElevenLabs Docs

Requirements

Environment Variables

Initialize Logger and Instrument ElevenLabs

Text-to-Speech (TTS)

Speech-to-Text (STT)

Linking Operations with Trace ID

Combining with LLM Calls

What Gets Traced

Debug Mode

Cleanup

What gets logged to Maxim

Resources