Class OpenAIRealtimeAgentAdapter

Exercise OpenAI's Realtime API as either the agent under test (role=AGENT, default) or as the voice-enabled user simulator (role=USER, per §7.2 L1164-1171).

When role=USER, scripted user("text") steps route text through the realtime session's text-input channel rather than triggering TTS.

Transcript observability:

lastUserTranscript — set from conversation.item.input_audio_transcription.completed
lastAgentTranscript — accumulated from response.audio_transcript.delta / reset on done

Hierarchy (View Summary)

VoiceAgentAdapter
- OpenAIRealtimeAgentAdapter

Index

Constructors

constructor

new OpenAIRealtimeAgentAdapter(
init?: OpenAIRealtimeAgentAdapterInit,
): OpenAIRealtimeAgentAdapter
Parameters
- init: OpenAIRealtimeAgentAdapterInit = {}
Returns OpenAIRealtimeAgentAdapter
Overrides VoiceAgentAdapter.constructor
- Defined in work/scenario/scenario/javascript/src/voice/adapters/openai-realtime.ts:170

Properties

`Optional`agentSpeakingEvent

agentSpeakingEvent?: AgentSpeakingEvent

Set when the adapter has emitted its first agent audio chunk for the current turn — gates timing-based barge-in. Concrete adapters expose this so scenario.interrupt can wait for real speech before firing the interruption. Optional: adapters without server-VAD-style interrupt sequencing can leave it undefined.

`Readonly`capabilities

capabilities: AdapterCapabilities = ...

Declaration of what this adapter can and cannot do. Concrete subclasses MUST publish a non-default value; the base instance defaults to "nothing supported" so capability-gated steps fail safely when an adapter forgets to declare.

`Readonly`instructions

instructions: string

lastAgentTranscript

lastAgentTranscript: string | null = null

Most recent finalized agent transcript (post audio_transcript.done).

lastUserTranscript

lastUserTranscript: string | null = null

Most recent user-side transcript from the Whisper input pipeline.

`Readonly`model

model: string

`Optional`name

name?: string

responseMaxDuration

responseMaxDuration: number = 30.0

Hard cap on a single agent turn's audio. Prevents runaway loops if a transport never signals end-of-stream. 30s = a long sentence.

responseTailSilence

responseTailSilence: number = 0.6

Tail silence: once the first agent chunk arrives, keep draining receiveAudio until no chunk shows up within this many seconds — that's how we detect the agent finished talking.

responseTimeout

responseTimeout: number = 30.0

Seconds to wait for agent audio after sending user audio.

role

role: AgentRole

`Optional`streamingTranscript

streamingTranscript?: string

Incremental transcript text emitted while the agent speaks. Populated by adapters that advertise capabilities.streamingTranscripts. Read by scenario.interrupt when afterWords: N is set.

`Readonly`tools

tools: RealtimeToolDef[]

`Readonly`voice

voice: string

Accessors

url

get url(): string
Returns string
- Defined in work/scenario/scenario/javascript/src/voice/adapters/openai-realtime.ts:181

Methods

call

call(input: AgentInput): Promise<AgentReturnTypes>
Surface realtime tool calls alongside the spoken audio turn (#630).

The base call() (defaultVoiceCall) returns a single assistant audio message and does all the recording bookkeeping. We keep that intact and, when the agent called any tools this turn, append ONE extra role:"tool" message carrying every call as AI-SDK tool-result parts — the shape state.hasToolCall / state.lastToolCall consume (AC4).

Returns:
- the single audio message when no tools were called — byte-identical to the base behaviour (AC8 regression), OR
- [audioMessage, toolMessage] when ≥1 tool was called (AC4/AC10). convertAgentReturnTypesToMessages passes a list through verbatim into the run's messages.
Per-turn tool state is reset HERE (turn start) so tool calls never leak across turns; the function-call events for THIS turn are consumed inside super.call()'s drain and finalized onto _completedToolCalls.
Parameters
- input: AgentInput
Returns Promise<AgentReturnTypes>
Overrides VoiceAgentAdapter.call
- Defined in work/scenario/scenario/javascript/src/voice/adapters/openai-realtime.ts:604

connect

connect(): Promise<void>
Open the Realtime WebSocket and send the initial session.update.

Returns Promise<void>
Overrides VoiceAgentAdapter.connect
- Defined in work/scenario/scenario/javascript/src/voice/adapters/openai-realtime.ts:202

disconnect

disconnect(): Promise<void>
Close the WebSocket if open.

Returns Promise<void>
Overrides VoiceAgentAdapter.disconnect
- Defined in work/scenario/scenario/javascript/src/voice/adapters/openai-realtime.ts:296

interrupt

interrupt(): Promise<void>
Send response.cancel — the OpenAI Realtime API's first-class interrupt. The model stops generating audio and text immediately. No timing race against VAD: deterministic stop, then the next user turn flows normally through sendAudio + receiveAudio.

Returns Promise<void>
Overrides VoiceAgentAdapter.interrupt
- Defined in work/scenario/scenario/javascript/src/voice/adapters/openai-realtime.ts:349

isConnected

isConnected(): boolean
Whether the Realtime WebSocket is open (Gap #11).

Returns boolean
Overrides VoiceAgentAdapter.isConnected
- Defined in work/scenario/scenario/javascript/src/voice/adapters/openai-realtime.ts:291

receiveAudio

receiveAudio(timeout: number): Promise<AudioChunk>
Commit any pending audio, request a response, and return the first audio chunk the model produces.

Loops over incoming events until a response.output_audio.delta event arrives, then returns decoded PCM16. Transcript events update lastUserTranscript / lastAgentTranscript. An error event throws.

GA event names are response.output_audio[_transcript].{delta,done} (the Beta response.audio[_transcript].* names are dead). We accept both so back-port to a Beta endpoint stays trivial; production hits the GA path.
Parameters
- timeout: number
Returns Promise<AudioChunk>
Overrides VoiceAgentAdapter.receiveAudio
- Defined in work/scenario/scenario/javascript/src/voice/adapters/openai-realtime.ts:369

sendAudio

sendAudio(chunk: AudioChunk): Promise<void>
Append a PCM16 audio chunk to the model's input audio buffer.

Only emits input_audio_buffer.append — commit + response are deferred to the next receiveAudio call. The executor may call sendAudio many times for a single user turn (TTS streams audio as chunks); committing per-chunk would confuse the server with sub-second turn boundaries.
Parameters
- chunk: AudioChunk
Returns Promise<void>
Overrides VoiceAgentAdapter.sendAudio
- Defined in work/scenario/scenario/javascript/src/voice/adapters/openai-realtime.ts:332

sendDtmf

sendDtmf(_tones: string): Promise<void>
Transmit DTMF tones to the telephony peer. Adapters that advertise capabilities.dtmf MUST implement this; the default raises UnsupportedCapabilityError so an adapter that forgot to ship sendDtmf while claiming the capability fails loudly instead of silently routing through a PCM fallback.
Parameters
- _tones: string
Returns Promise<void>
Inherited from VoiceAgentAdapter.sendDtmf
- Defined in work/scenario/scenario/javascript/src/voice/adapter.ts:138

sendText

sendText(text: string): Promise<void>
Inject scripted text into the realtime session as a user message.

Used when this adapter is the user simulator (role=USER): scripted user("text") steps route through here instead of spawning TTS. The model synthesizes the text into spoken audio with natural prosody, which is then delivered via receiveAudio.

Per §7.2, OpenAI Realtime cannot populate assistant audio messages retroactively; the downstream transcript reflects what the model actually emitted, not what was scripted.
Parameters
- text: string
Returns Promise<void>
- Defined in work/scenario/scenario/javascript/src/voice/adapters/openai-realtime.ts:666

toString

toString(): string
Hide the API key when this object lands in error messages or logs.

Returns string
- Defined in work/scenario/scenario/javascript/src/voice/adapters/openai-realtime.ts:188

Class OpenAIRealtimeAgentAdapter

Hierarchy (View Summary)

Index

Constructors

Properties

Accessors

Methods

Constructors

constructor

Parameters

Returns OpenAIRealtimeAgentAdapter

Properties

OptionalagentSpeakingEvent

Readonlycapabilities

Readonlyinstructions

lastAgentTranscript

lastUserTranscript

Readonlymodel

Optionalname

responseMaxDuration

responseTailSilence

responseTimeout

role

OptionalstreamingTranscript

Readonlytools

Readonlyvoice

Accessors

url

Returns string

Methods

call

Parameters

Returns Promise<AgentReturnTypes>

connect

Returns Promise<void>

disconnect

Returns Promise<void>

interrupt

Returns Promise<void>

isConnected

Returns boolean

receiveAudio

Parameters

Returns Promise<AudioChunk>

sendAudio

Parameters

Returns Promise<void>

sendDtmf

Parameters

Returns Promise<void>

sendText

Parameters

Returns Promise<void>

toString

Returns string

Settings

On This Page

`Optional`agentSpeakingEvent

`Readonly`capabilities

`Readonly`instructions

`Readonly`model

`Optional`name

`Optional`streamingTranscript

`Readonly`tools

`Readonly`voice