OptionalagentSet when the adapter has emitted its first agent audio chunk for the
current turn — gates timing-based barge-in. Concrete adapters expose
this so scenario.interrupt can wait for real speech before
firing the interruption. Optional: adapters without server-VAD-style
interrupt sequencing can leave it undefined.
ReadonlyaudioOptionalcallReadonlycapabilitiesDeclaration of what this adapter can and cannot do. Concrete subclasses MUST publish a non-default value; the base instance defaults to "nothing supported" so capability-gated steps fail safely when an adapter forgets to declare.
OptionalnameHard cap on a single agent turn's audio. Prevents runaway loops if a transport never signals end-of-stream. 30s = a long sentence.
Tail silence: once the first agent chunk arrives, keep draining receiveAudio until no chunk shows up within this many seconds — that's how we detect the agent finished talking.
Seconds to wait for agent audio after sending user audio.
ReadonlysampleOptional ReadonlysignalingOptionalstreamingIncremental transcript text emitted while the agent speaks. Populated
by adapters that advertise capabilities.streamingTranscripts. Read
by scenario.interrupt when afterWords: N is set.
OptionalstreamReadonlytransportOptional ReadonlyurlConvenience: "<audioFormat>/<sampleRate>". Used in tests + matrix docs.
Default call() body, ported from Python VoiceAgentAdapter.call.
Threads the latest user-message audio through sendAudio, drains the agent response on tail silence, records one user and one agent segment into the executor state, and returns the merged assistant audio message. Subclasses may override for specialised flows but will usually inherit it.
Open the transport and prepare to exchange audio.
Close the transport and release resources.
Send a Twilio clear frame and truncate the JS-side receive buffer.
Side effects:
inbox.queue cleared — discards buffered PCM chunks not yet consumed
by drainAgentResponse.mulawChunks cleared — discards partially-accumulated µ-law that
hasn't been resampled yet.receiveAudio waiter (if any) woken with an empty chunk so
drainAgentResponse breaks its loop immediately without a timeout.interruptPhase set to "interrupted" — the NEXT receiveAudio call
returns an empty sentinel (stops the drain loop after the waiter wake),
and bufferMulaw discards late-arriving WS frames until the recovery
turn's sendAudio resets the phase to "idle".Whether the Media Streams WebSocket is open (Gap #11).
Transmit DTMF tones to the telephony peer. Adapters that advertise
capabilities.dtmf MUST implement this; the default raises
UnsupportedCapabilityError so an adapter that forgot to ship
sendDtmf while claiming the capability fails loudly instead of
silently routing through a PCM fallback.
Adapter that drives a running Pipecat bot over the Twilio Media Streams WS protocol. Default audio format = µ-law 8 kHz mono, which is what Pipecat's
TwilioFrameSerializerexpects.