Class GeminiLiveAgentAdapter

Gemini Live native-audio adapter.

Connects directly to the Gemini Live API via the @google/genai SDK. Audio flows bidirectionally as raw PCM16; canonical 24kHz internally, resampled to/from 16kHz at the wire boundary.

Remarks

The @google/genai package is declared as an optional peer dependency so the SDK ships without a hard Gemini coupling. Users who import this adapter must install @google/genai themselves.

Hierarchy (View Summary)

VoiceAgentAdapter
- GeminiLiveAgentAdapter

Constructors

constructor

new GeminiLiveAgentAdapter(
init?: GeminiLiveAgentAdapterInit,
): GeminiLiveAgentAdapter
Parameters
- init: GeminiLiveAgentAdapterInit = {}
Returns GeminiLiveAgentAdapter
Overrides VoiceAgentAdapter.constructor
- Defined in work/scenario/scenario/javascript/src/voice/adapters/gemini-live.ts:177

Properties

`Optional`agentSpeakingEvent

agentSpeakingEvent?: AgentSpeakingEvent

Set when the adapter has emitted its first agent audio chunk for the current turn — gates timing-based barge-in. Concrete adapters expose this so scenario.interrupt can wait for real speech before firing the interruption. Optional: adapters without server-VAD-style interrupt sequencing can leave it undefined.

`Readonly`capabilities

capabilities: AdapterCapabilities = ...

Declaration of what this adapter can and cannot do. Concrete subclasses MUST publish a non-default value; the base instance defaults to "nothing supported" so capability-gated steps fail safely when an adapter forgets to declare.

lastAgentTranscript

lastAgentTranscript: string | null = null

Most-recent output transcript received from the server, for observability.

`Readonly`model

model: string

`Optional`name

name?: string

responseMaxDuration

responseMaxDuration: number = 30.0

Hard cap on a single agent turn's audio. Prevents runaway loops if a transport never signals end-of-stream. 30s = a long sentence.

responseTailSilence

responseTailSilence: number = 0.6

Tail silence: once the first agent chunk arrives, keep draining receiveAudio until no chunk shows up within this many seconds — that's how we detect the agent finished talking.

responseTimeout

responseTimeout: number = 30.0

Seconds to wait for agent audio after sending user audio.

role

role: AgentRole = AgentRole.AGENT

`Optional`streamingTranscript

streamingTranscript?: string

Incremental transcript text emitted while the agent speaks. Populated by adapters that advertise capabilities.streamingTranscripts. Read by scenario.interrupt when afterWords: N is set.

`Readonly`systemInstruction

systemInstruction: string

`Readonly`voice

voice: string

Methods

call

call(input: AgentInput): Promise<AgentReturnTypes>
Default call() body, ported from Python VoiceAgentAdapter.call.

Threads the latest user-message audio through sendAudio, drains the agent response on tail silence, records one user and one agent segment into the executor state, and returns the merged assistant audio message. Subclasses may override for specialised flows but will usually inherit it.
Parameters
- input: AgentInput
Returns Promise<AgentReturnTypes>
Inherited from VoiceAgentAdapter.call
- Defined in work/scenario/scenario/javascript/src/voice/adapter.ts:67

connect

connect(): Promise<void>
Open a Gemini Live session.

Lazy-imports @google/genai so the SDK only loads when this adapter is actually used. Registers an onmessage callback that pushes LiveServerMessage instances onto an internal queue, which receiveAudio drains.

Returns Promise<void>
Overrides VoiceAgentAdapter.connect
- Defined in work/scenario/scenario/javascript/src/voice/adapters/gemini-live.ts:199

disconnect

disconnect(): Promise<void>
Close the Gemini Live session and release the WebSocket.

Returns Promise<void>
Overrides VoiceAgentAdapter.disconnect
- Defined in work/scenario/scenario/javascript/src/voice/adapters/gemini-live.ts:263

interrupt

interrupt(): Promise<void>
Signal an in-flight receiveAudio() to return the cut-off sentinel immediately, so the interrupted turn doesn't replay stale agent audio into the next turn's drain loop.

Abort-sentinel pattern (fixes the single-consumer concurrency race):

The original implementation called dequeue() concurrently with an in-flight receiveAudio(). Since dequeue() has a single resolveNext slot, the second caller (interrupt) overwrote the first caller's (receiveAudio's) resolver. When a message arrived it resolved the interrupt's dequeue, leaving receiveAudio's resolver orphaned — its timer eventually fired with a TimeoutError, causing drainAgentResponse to catch and break prematurely.

Fix: interrupt() no longer calls dequeue(). Instead it:
1. Sets _interruptPending = true.
2. Wakes any in-flight dequeue() by calling the current resolveNext directly with an abort sentinel ({ interrupted: true }).
3. receiveAudio()'s loop checks _interruptPending (and item.interrupted) and returns the cut-off sentinel immediately on seeing it.
Best-effort: if nothing is in-flight (resolveNext is null), the flag stays set and receiveAudio() catches it at the top of its next iteration.
Returns Promise<void>
Overrides VoiceAgentAdapter.interrupt
- Defined in work/scenario/scenario/javascript/src/voice/adapters/gemini-live.ts:511

isConnected

isConnected(): boolean
Whether the Gemini Live session is open (Gap #11).

Returns boolean
Overrides VoiceAgentAdapter.isConnected
- Defined in work/scenario/scenario/javascript/src/voice/adapters/gemini-live.ts:256

receiveAudio

receiveAudio(timeout: number): Promise<AudioChunk>
Receive the next AudioChunk from the agent.
Parameters
- timeout: number
Returns Promise<AudioChunk>
Overrides VoiceAgentAdapter.receiveAudio
- Defined in work/scenario/scenario/javascript/src/voice/adapters/gemini-live.ts:344

sendAudio

sendAudio(chunk: AudioChunk): Promise<void>
Send a canonical 24kHz AudioChunk to Gemini Live as a complete turn.

Resamples 24kHz → 16kHz at the wire boundary and wraps the audio in explicit activityStart / activityEnd markers. With Automatic Activity Detection disabled (see connect), each sendAudio call is a complete user turn from Gemini's perspective: the model replies the moment we close the turn.
Parameters
- chunk: AudioChunk
Returns Promise<void>
Overrides VoiceAgentAdapter.sendAudio
- Defined in work/scenario/scenario/javascript/src/voice/adapters/gemini-live.ts:295

sendDtmf

sendDtmf(_tones: string): Promise<void>
Transmit DTMF tones to the telephony peer. Adapters that advertise capabilities.dtmf MUST implement this; the default raises UnsupportedCapabilityError so an adapter that forgot to ship sendDtmf while claiming the capability fails loudly instead of silently routing through a PCM fallback.
Parameters
- _tones: string
Returns Promise<void>
Inherited from VoiceAgentAdapter.sendDtmf
- Defined in work/scenario/scenario/javascript/src/voice/adapter.ts:138

Class GeminiLiveAgentAdapter

Remarks

Hierarchy (View Summary)

Index

Constructors

Properties

Methods

Constructors

constructor

Parameters

Returns GeminiLiveAgentAdapter

Properties

OptionalagentSpeakingEvent

Readonlycapabilities

lastAgentTranscript

Readonlymodel

Optionalname

responseMaxDuration

responseTailSilence

responseTimeout

role

OptionalstreamingTranscript

ReadonlysystemInstruction

Readonlyvoice

Methods

call

Parameters

Returns Promise<AgentReturnTypes>

connect

Returns Promise<void>

disconnect

Returns Promise<void>

interrupt

Returns Promise<void>

isConnected

Returns boolean

receiveAudio

Parameters

Returns Promise<AudioChunk>

sendAudio

Parameters

Returns Promise<void>

sendDtmf

Parameters

Returns Promise<void>

Settings

On This Page

`Optional`agentSpeakingEvent

`Readonly`capabilities

`Readonly`model

`Optional`name

`Optional`streamingTranscript

`Readonly`systemInstruction

`Readonly`voice