@langwatch/scenario
    Preparing search index...

    Class OpenAIRealtimeAgentAdapter

    Exercise OpenAI's Realtime API as either the agent under test (role=AGENT, default) or as the voice-enabled user simulator (role=USER, per §7.2 L1164-1171).

    When role=USER, scripted user("text") steps route text through the realtime session's text-input channel rather than triggering TTS.

    Transcript observability:

    • lastUserTranscript — set from conversation.item.input_audio_transcription.completed
    • lastAgentTranscript — accumulated from response.audio_transcript.delta / reset on done

    Hierarchy (View Summary)

    Index

    Constructors

    Properties

    agentSpeakingEvent?: AgentSpeakingEvent

    Set when the adapter has emitted its first agent audio chunk for the current turn — gates timing-based barge-in. Concrete adapters expose this so scenario.interrupt can wait for real speech before firing the interruption. Optional: adapters without server-VAD-style interrupt sequencing can leave it undefined.

    capabilities: AdapterCapabilities = ...

    Declaration of what this adapter can and cannot do. Concrete subclasses MUST publish a non-default value; the base instance defaults to "nothing supported" so capability-gated steps fail safely when an adapter forgets to declare.

    instructions: string
    lastAgentTranscript: string | null = null

    Most recent finalized agent transcript (post audio_transcript.done).

    lastUserTranscript: string | null = null

    Most recent user-side transcript from the Whisper input pipeline.

    model: string
    name?: string
    responseMaxDuration: number = 30.0

    Hard cap on a single agent turn's audio. Prevents runaway loops if a transport never signals end-of-stream. 30s = a long sentence.

    responseTailSilence: number = 0.6

    Tail silence: once the first agent chunk arrives, keep draining receiveAudio until no chunk shows up within this many seconds — that's how we detect the agent finished talking.

    responseTimeout: number = 30.0

    Seconds to wait for agent audio after sending user audio.

    role: AgentRole
    streamingTranscript?: string

    Incremental transcript text emitted while the agent speaks. Populated by adapters that advertise capabilities.streamingTranscripts. Read by scenario.interrupt when afterWords: N is set.

    tools: RealtimeToolDef[]
    voice: string

    Accessors

    Methods

    • Surface realtime tool calls alongside the spoken audio turn (#630).

      The base call() (defaultVoiceCall) returns a single assistant audio message and does all the recording bookkeeping. We keep that intact and, when the agent called any tools this turn, append ONE extra role:"tool" message carrying every call as AI-SDK tool-result parts — the shape state.hasToolCall / state.lastToolCall consume (AC4).

      Returns:

      • the single audio message when no tools were called — byte-identical to the base behaviour (AC8 regression), OR
      • [audioMessage, toolMessage] when ≥1 tool was called (AC4/AC10). convertAgentReturnTypesToMessages passes a list through verbatim into the run's messages.

      Per-turn tool state is reset HERE (turn start) so tool calls never leak across turns; the function-call events for THIS turn are consumed inside super.call()'s drain and finalized onto _completedToolCalls.

      Parameters

      Returns Promise<AgentReturnTypes>

    • Commit any pending audio, request a response, and return the first audio chunk the model produces.

      Loops over incoming events until a response.output_audio.delta event arrives, then returns decoded PCM16. Transcript events update lastUserTranscript / lastAgentTranscript. An error event throws.

      GA event names are response.output_audio[_transcript].{delta,done} (the Beta response.audio[_transcript].* names are dead). We accept both so back-port to a Beta endpoint stays trivial; production hits the GA path.

      Parameters

      • timeout: number

      Returns Promise<AudioChunk>

    • Inject scripted text into the realtime session as a user message.

      Used when this adapter is the user simulator (role=USER): scripted user("text") steps route through here instead of spawning TTS. The model synthesizes the text into spoken audio with natural prosody, which is then delivered via receiveAudio.

      Per §7.2, OpenAI Realtime cannot populate assistant audio messages retroactively; the downstream transcript reflects what the model actually emitted, not what was scripted.

      Parameters

      • text: string

      Returns Promise<void>