@langwatch/scenario
    Preparing search index...

    Class ElevenLabsVoiceAgent

    Composable voice agent with ElevenLabs-opinionated defaults.

    Not to be confused with ElevenLabsAgentAdapter (above) which talks to ElevenLabs' hosted ConvAI endpoint. This class is local: you compose ElevenLabsSTTProvider + any LLM + ElevenLabs TTS yourself.

    Default stack:

    • STT: ElevenLabsSTTProvider with the same API key.
    • LLM: openai("gpt-5.4-mini") — text-only chat completion.
    • TTS: elevenlabs/EXAVITQu4vr4xnSDxMaL (Sarah — free-tier premade). Override via the ELEVENLABS_VOICE_ID env var or the voice arg.
    // Defaults — all ElevenLabs STT, gpt-5.4-mini, EL TTS
    const agent = new ElevenLabsVoiceAgent({ apiKey: process.env.ELEVENLABS_API_KEY! });

    // Override just the LLM
    import { anthropic } from "@ai-sdk/anthropic";
    const agent = new ElevenLabsVoiceAgent({ apiKey, llm: anthropic("claude-sonnet-4-6") });

    // Bring your own STT
    const agent = new ElevenLabsVoiceAgent({ apiKey, stt: new MyCustomSTT() });

    Hierarchy (View Summary)

    Index

    Constructors

    Properties

    agentSpeakingEvent?: AgentSpeakingEvent

    Set when the adapter has emitted its first agent audio chunk for the current turn — gates timing-based barge-in. Concrete adapters expose this so scenario.interrupt can wait for real speech before firing the interruption. Optional: adapters without server-VAD-style interrupt sequencing can leave it undefined.

    capabilities: AdapterCapabilities = ...

    Declaration of what this adapter can and cannot do. Concrete subclasses MUST publish a non-default value; the base instance defaults to "nothing supported" so capability-gated steps fail safely when an adapter forgets to declare.

    history: ModelMessage[]
    lastLlmResponse: string | null = null
    lastUserTranscript: string | null = null
    llm: LanguageModel
    name?: string
    responseMaxDuration: number = 30.0

    Hard cap on a single agent turn's audio. Prevents runaway loops if a transport never signals end-of-stream. 30s = a long sentence.

    responseTailSilence: number = 0.6

    Tail silence: once the first agent chunk arrives, keep draining receiveAudio until no chunk shows up within this many seconds — that's how we detect the agent finished talking.

    responseTimeout: number = 30.0

    Seconds to wait for agent audio after sending user audio.

    role: AgentRole = AgentRole.AGENT
    streamingTranscript?: string

    Incremental transcript text emitted while the agent speaks. Populated by adapters that advertise capabilities.streamingTranscripts. Read by scenario.interrupt when afterWords: N is set.

    tts: string
    ttsOptions: SynthesizeOptions
    turnOutputEmitted: boolean = false

    Turn-output guard. The default call() drains receiveAudio until tail-silence; on this adapter that would kick a second LLM call. Reset by sendAudio (new user turn → new LLM call allowed), set by the end of receiveAudio.

    voice: string
    DEFAULT_SYSTEM_PROMPT: string = ...

    Methods

    • Whether the transport is currently open and ready to exchange audio (Gap #11). The default call flow (defaultVoiceCall) consults this BEFORE sending audio and raises PendingTransportError uniformly when it returns false — so a call() issued before the executor's connect() fails with one clear error across every transport instead of a transport-specific null-dereference or silent hang.

      Base default is true: adapters with no meaningful "not connected" state (in-process composable, test doubles) never trip the gate. Network transport leaves override this to report their real socket/session state.

      Returns boolean