Creates a new ScenarioExecution instance.
The scenario configuration containing agents, settings, and metadata
The ordered sequence of script steps that define the test flow
Batch run ID for grouping scenario runs
OptionalrunId: stringOptional pre-assigned run ID. When provided, the execution uses this ID instead of generating a new one. This prevents duplicate entries when the platform pre-creates placeholder rows with a known ID.
OptionalaudioLive local-speaker playback sink. Constructed at run start when
audioPlayback === true (per-run wins over global per ADR-002). Each
audio chunk is fanned out here via fireAudioChunk alongside the recording.
undefined when audioPlayback is disabled (the common case).
Readonlyevents$An observable stream of events that occur during the scenario execution. Subscribe to this to monitor the progress of the scenario in real-time.
Events include:
Optional InternalinterruptOptional delay (ms) applied AFTER the agent starts speaking in
fireUserInterrupt. Set by prepareAndFireBargeIn from
InterruptionConfig.sampleDelay. Consumed (reset to undefined) on each
barge-in. See also interruptOverrides.bargeInDelayMs.
Set by prepareAndFireBargeIn; consumed by fireUserInterrupt.
Optional InternalinterruptSingle override bag for all test-injectable interrupt seams.
Consolidates the three formerly scattered @internal public fields into
one named gateway (issue #575). Tests assign this directly — no
as unknown as cast needed:
exec.interruptOverrides = { rng: () => 0 };
Fields:
rng — RNG for interruption decisions (defaults to Math.random).waitForSpeechMs — per-barge-in wait bound in fireUserInterrupt
(overrides DEFAULT_WAIT_FOR_SPEECH_MS). Same value that the
interrupt() step threads through waitForSpeechTimeout.bargeInDelayMs — post-speech delay in fireUserInterrupt (set
by prepareAndFireBargeIn from InterruptionConfig.sampleDelay).Optional InternalinterruptOptional per-barge-in wait override (ms) for fireUserInterrupt.
Threaded by the interrupt() step from waitForSpeechTimeout so the
step and the executor agree on ONE timeout. Consumed (reset to
undefined) on each barge-in. See also interruptOverrides.waitForSpeechMs.
Set by the interrupt() script step; consumed by fireUserInterrupt.
OptionalonPer-chunk hook from ScenarioConfig.onAudioChunk.
OptionalonPer-event hook from ScenarioConfig.onVoiceEvent.
Byte-accurate audio cursor (seconds) — cumulative PCM byte-duration of all
segments laid so far. Drives segment start/end so voiceRecording.duration
tracks the full.wav byte-duration, not wall-clock send latency (M1).
OptionalvoiceBackground ambience recorded by backgroundNoise(source, volume) — read
by the user-simulator audio path when mixing turns (Gap #8).
Resolved per-run voice config (ADR-002 / Gap #7). Set at run start from
cfg.voice when voice adapters are present; the consumer agents read
the provider/knobs here instead of a module global.
OptionalvoiceInterruption config recorded by voiceProceed({ interruptions }). Read
at the top of each proceed() iteration to decide barge-ins (Gap #8).
Response-time measurements from agent_start_speaking events.
PCM16 segments + timeline accumulated during the run.
Monotonic clock anchor (performance.now() / 1000) for offsets.
Mirror of voiceRecording.timeline for direct subscribers.
Gets the complete conversation history as an array of messages.
Array of ModelMessage objects representing the full conversation
Gets the result of the scenario execution if it has been set.
The scenario result or undefined if not yet set
Gets the unique identifier for the conversation thread. This ID is used to maintain conversation context across multiple runs.
The thread identifier string
Adds execution time for a specific agent to the performance tracking.
This method is used internally to track how long each agent takes to respond, which is included in the final scenario result for performance analysis. The accumulated time for each agent is used to calculate total agent response times in the scenario result.
The index of the agent in the agents array
The execution time in milliseconds to add to the agent's total
Executes an agent turn in the conversation.
If content is provided, it's used directly as the agent's response. If not provided, the agent under test is called to generate a response based on the current conversation context and any pending messages.
This method is part of the ScenarioExecutionLike interface used by script steps.
Optionalcontent: string | ModelMessageOptional content for the agent's response. Can be a string or ModelMessage. If not provided, the agent under test will generate the response.
Fire an agent turn WITHOUT awaiting it (PRD §4.4 agent({ wait: false })).
The in-flight promise is recorded on pendingAgentTask so the next
user call can detect it and fire a mid-stream barge-in. Mirrors
Python's agent(wait=False) setting _pending_agent_task.
Errors from the background turn are swallowed here (they surface via the
recorded segments / the recovery turn) — exactly as the previous
void executor.agent().catch() call sites did.
Optionalcontent: string | ModelMessageExecutes the entire scenario from start to finish.
This method runs through all script steps sequentially until a final result (success, failure, or error) is determined. Each script step can trigger one or more agent interactions depending on the step type:
user() and agent() steps typically trigger one agent interaction eachproceed() steps can trigger multiple agent interactions across multiple turnsjudge() steps trigger the judge agent to evaluate the conversationsucceed() and fail() steps immediately end the scenarioThe execution will stop early if:
A promise that resolves with the final result of the scenario
Immediately ends the scenario with a failure verdict.
This method forces the scenario to end with failure, regardless of the current conversation state. It's useful for scenarios where you want to explicitly mark failure based on specific conditions or external factors.
This method is part of the ScenarioExecutionLike interface used by script steps.
Optionalreasoning: stringOptional explanation for why the scenario is being marked as failed
A promise that resolves with the final failed scenario result
Invokes the judge agent to evaluate the current state of the conversation.
The judge agent analyzes the conversation history and determines whether the scenario criteria have been met. This can result in either:
This method is part of the ScenarioExecutionLike interface used by script steps.
Optionaloptions: { context?: string; criteria?: string[] }Optional options with inline criteria to evaluate as a checkpoint.
A promise that resolves with:
// Let judge evaluate with its configured criteria
const result = await execution.judge();
// Evaluate inline criteria as a checkpoint
const result = await execution.judge({ criteria: ["Agent responded helpfully"] });
// Provide additional context for tool-call-heavy conversations
const result = await execution.judge({
criteria: ["Agent installed the dependency"],
context: "The agent ran `npm install -g git-orchard` which exited 0.",
});
Adds a message to the conversation history.
This method is part of the ScenarioExecutionLike interface used by script steps. It automatically routes the message to the appropriate agent based on the message role:
The ModelMessage to add to the conversation
Lets the scenario proceed automatically for a specified number of turns.
This method is a script step that simulates natural conversation flow by allowing agents to interact automatically without explicit script steps. It can trigger multiple agent interactions across multiple turns, making it useful for testing scenarios where you want to see how agents behave in extended conversations.
Unlike other script steps that typically trigger one agent interaction each, this step can trigger many agent interactions depending on the number of turns and the agents' behavior.
The method will continue until:
Optionalturns: numberThe number of turns to proceed. If undefined, runs until a conclusion or max turns is reached
OptionalonTurn: (state: ScenarioExecutionStateLike) => void | Promise<void>Optional callback executed at the end of each turn. Receives the current execution state
OptionalonStep: (state: ScenarioExecutionStateLike) => void | Promise<void>Optional callback executed after each agent interaction. Receives the current execution state
A promise that resolves with:
// Proceed for 5 turns
const result = await execution.proceed(5);
// Proceed until conclusion with callbacks
const result = await execution.proceed(
undefined,
(state) => console.log(`Turn ${state.currentTurn} completed`),
(state) => console.log(`Agent interaction completed, ${state.messages.length} messages`)
);
Executes a single agent interaction in the scenario.
This method is for manual step-by-step execution of the scenario, where each call
represents one agent taking their turn. This is different from script steps (like
user(), agent(), proceed(), etc.) which are functions in the scenario script.
Each call to this method will:
Note: This method is primarily for debugging or custom execution flows. Most users
will use execute() to run the entire scenario automatically.
After calling this method, check this.result to see if the scenario has concluded.
Immediately ends the scenario with a success verdict.
This method forces the scenario to end successfully, regardless of the current conversation state. It's useful for scenarios where you want to explicitly mark success based on specific conditions or external factors.
This method is part of the ScenarioExecutionLike interface used by script steps.
Optionalreasoning: stringOptional explanation for why the scenario is being marked as successful
A promise that resolves with the final successful scenario result
Executes a user turn in the conversation.
If content is provided, it's used directly as the user's message. If not provided, the user simulator agent is called to generate an appropriate response based on the current conversation context.
This method is part of the ScenarioExecutionLike interface used by script steps.
Optionalcontent: string | ModelMessageOptional content for the user's message. Can be a string or ModelMessage. If not provided, the user simulator agent will generate the content.
Manages the execution of a single scenario test.
This class orchestrates the interaction between agents (user simulator, agent under test, and judge), executes the test script step-by-step, and manages the scenario's state throughout execution. It also emits events that can be subscribed to for real-time monitoring of the scenario's progress.
Execution Flow Overview
The execution follows a turn-based system where agents take turns responding. The key concepts are:
user(),agent(),proceed(), etc.Message Broadcasting System
The class implements a sophisticated message broadcasting system that ensures all agents can "hear" each other's messages:
broadcastMessage()pendingMessages) that stores messages from other agentsThis creates a realistic conversation environment where agents can respond contextually to the full conversation history and any new messages from other agents.
Example Message Flow
Each script step can trigger one or more agent interactions depending on the step type. For example, a
proceed(5)step might trigger 10 agent interactions across 5 turns.Note: This is an internal class. Most users will interact with the higher-level
scenario.run()function instead of instantiating this class directly.Example