ElevenLabs provider summary
ElevenLabs is an audio-focused provider supporting text-to-speech (TTS) with streaming, speech-to-text (STT) with diarization, timestamp support, voice configuration (stability, similarity boost), and multiple output formats (MP3, Opus, WAV/PCM).
| Property | Details |
|---|---|
| Description | Audio-focused provider: TTS with voice control, STT with diarization. |
| Provider route | elevenlabs/<voice_id> |
| Supported operations | TTS (streaming), STT (diarization), List Models |
Supported operations
Speech (TTS)
Streaming and non-streaming via /v1/text-to-speech/{voice_id}. Timestamp support via /v1/text-to-speech/{voice_id}/with-timestamps
Transcriptions (STT)
Non-streaming via /v1/speech-to-text. Diarization, word/character timestamps, audio event tagging supported.
List Models
Via /v1/models endpoint. Returns available models with language support and capabilities.
Key features
- • Voice configuration: stability, similarity boost, speaker boost
- • Multiple output formats: MP3, Opus, WAV/PCM
- • STT with speaker identification and diarization
- • Timestamp support for detailed temporal information
- • Custom pronunciation dictionaries for accurate transcription