Class ScenarioExecution

Manages the execution of a single scenario test.

This class orchestrates the interaction between agents (user simulator, agent under test, and judge), executes the test script step-by-step, and manages the scenario's state throughout execution. It also emits events that can be subscribed to for real-time monitoring of the scenario's progress.

Execution Flow Overview

The execution follows a turn-based system where agents take turns responding. The key concepts are:

Script Steps: Functions in the scenario script like user(), agent(), proceed(), etc.
Agent Interactions: Individual agent responses that occur when an agent takes their turn
Turns: Groups of agent interactions that happen in sequence

Message Broadcasting System

The class implements a sophisticated message broadcasting system that ensures all agents can "hear" each other's messages:

Message Creation: When an agent sends a message, it's added to the conversation history
Broadcasting: The message is immediately broadcast to all other agents via broadcastMessage()
Queue Management: Each agent has a pending message queue (pendingMessages) that stores messages from other agents
Agent Input: When an agent is called, it receives both the full conversation history and any new pending messages that have been broadcast to it
Queue Clearing: After an agent processes its pending messages, its queue is cleared

This creates a realistic conversation environment where agents can respond contextually to the full conversation history and any new messages from other agents.

Example Message Flow

Turn 1:
1. User Agent sends: "Hello"
   - Added to conversation history
   - Broadcast to Agent and Judge (pendingMessages[1] = ["Hello"], pendingMessages[2] = ["Hello"])

2. Agent is called:
   - Receives: full conversation + pendingMessages[1] = ["Hello"]
   - Sends: "Hi there! How can I help you?"
   - Added to conversation history
   - Broadcast to User and Judge (pendingMessages[0] = ["Hi there!..."], pendingMessages[2] = ["Hello", "Hi there!..."])
   - pendingMessages[1] is cleared

3. Judge is called:
   - Receives: full conversation + pendingMessages[2] = ["Hello", "Hi there!..."]
   - Evaluates and decides to continue
   - pendingMessages[2] is cleared

Each script step can trigger one or more agent interactions depending on the step type. For example, a proceed(5) step might trigger 10 agent interactions across 5 turns.

Note: This is an internal class. Most users will interact with the higher-level scenario.run() function instead of instantiating this class directly.

Example

import scenario from "@langwatch/scenario";

// This is a simplified example of what `scenario.run` does internally.
const result = await scenario.run({
  name: "My First Scenario",
  description: "A simple test of the agent's greeting.",
  agents: [
    scenario.userSimulatorAgent(),
    scenario.judgeAgent({
      criteria: ["Agent should respond with a greeting"],
    }),
  ],
  script: [
    scenario.user("Hello"),     // Script step 1: triggers 1 agent interaction
    scenario.agent(),           // Script step 2: triggers 1 agent interaction
    scenario.proceed(3),        // Script step 3: triggers multiple agent interactions
    scenario.judge(),           // Script step 4: triggers 1 agent interaction
  ]
});

console.log("Scenario result:", result.success);

Implements

Constructors

constructor

new ScenarioExecution(
    config: ScenarioConfig,
    script: ScriptStep[],
    batchRunId: string,
    runId?: string,
): ScenarioExecution
Creates a new ScenarioExecution instance.
Parameters
- config: ScenarioConfig
  The scenario configuration containing agents, settings, and metadata
- script: ScriptStep[]
  The ordered sequence of script steps that define the test flow
- batchRunId: string
  Batch run ID for grouping scenario runs
- OptionalrunId: string
  Optional pre-assigned run ID. When provided, the execution uses this ID instead of generating a new one. This prevents duplicate entries when the platform pre-creates placeholder rows with a known ID.
Returns ScenarioExecution
- Defined in work/scenario/scenario/javascript/src/execution/scenario-execution.ts:326

Properties

`Optional`audioPlaybackSink

audioPlaybackSink?: AudioPlaybackSink | null

Live local-speaker playback sink. Constructed at run start when audioPlayback === true (per-run wins over global per ADR-002). Each audio chunk is fanned out here via fireAudioChunk alongside the recording. undefined when audioPlayback is disabled (the common case).

`Readonly`events$

events$: Observable<
    | {
        batchRunId: string;
        metadata: { description?: string; name?: string } & {
            [k: string]: unknown;
        };
        rawEvent?: any;
        scenarioId: string;
        scenarioRunId: string;
        scenarioSetId: string;
        timestamp: number;
        type: RUN_STARTED;
    }
    | {
        batchRunId: string;
        rawEvent?: any;
        results?: | {
            error?: string;
            metCriteria: string[];
            reasoning?: string;
            unmetCriteria: string[];
            verdict: Verdict;
        }
        | null;
        scenarioId: string;
        scenarioRunId: string;
        scenarioSetId: string;
        status: ScenarioRunStatus;
        timestamp: number;
        type: RUN_FINISHED;
    }
    | {
        batchRunId: string;
        messages: (
            | { content: string; id: string; name?: string; role: "developer" }
            | { content: string; id: string; name?: string; role: "system" }
            | {
                content?: string;
                id: string;
                name?: string;
                role: "assistant";
                toolCalls?: {
                    function: { arguments: string; name: string };
                    id: string;
                    type: "function";
                }[];
            }
            | { content: string; id: string; name?: string; role: "user" }
            | { content: string; id: string; role: "tool"; toolCallId: string }
        )[];
        rawEvent?: any;
        scenarioId: string;
        scenarioRunId: string;
        scenarioSetId: string;
        timestamp: number;
        type: MESSAGE_SNAPSHOT;
    },
> = ...

An observable stream of events that occur during the scenario execution. Subscribe to this to monitor the progress of the scenario in real-time.

Events include:

RUN_STARTED: When scenario execution begins
MESSAGE_SNAPSHOT: After each message is added to the conversation
RUN_FINISHED: When scenario execution completes (success/failure/error)

`Optional` `Internal`interruptBargeInDelayMs

interruptBargeInDelayMs?: number

Optional delay (ms) applied AFTER the agent starts speaking in fireUserInterrupt. Set by prepareAndFireBargeIn from InterruptionConfig.sampleDelay. Consumed (reset to undefined) on each barge-in. See also interruptOverrides.bargeInDelayMs.

Set by prepareAndFireBargeIn; consumed by fireUserInterrupt.

`Optional` `Internal`interruptOverrides

interruptOverrides?: {
    bargeInDelayMs?: number;
    rng?: () => number;
    waitForSpeechMs?: number;
}

Single override bag for all test-injectable interrupt seams.

Consolidates the three formerly scattered @internal public fields into one named gateway (issue #575). Tests assign this directly — no as unknown as cast needed:

exec.interruptOverrides = { rng: () => 0 };

Fields:

rng — RNG for interruption decisions (defaults to Math.random).
waitForSpeechMs — per-barge-in wait bound in fireUserInterrupt (overrides DEFAULT_WAIT_FOR_SPEECH_MS). Same value that the interrupt() step threads through waitForSpeechTimeout.
bargeInDelayMs — post-speech delay in fireUserInterrupt (set by prepareAndFireBargeIn from InterruptionConfig.sampleDelay).

`Optional` `Internal`interruptWaitForSpeechMs

interruptWaitForSpeechMs?: number

Optional per-barge-in wait override (ms) for fireUserInterrupt. Threaded by the interrupt() step from waitForSpeechTimeout so the step and the executor agree on ONE timeout. Consumed (reset to undefined) on each barge-in. See also interruptOverrides.waitForSpeechMs.

Set by the interrupt() script step; consumed by fireUserInterrupt.

`Optional`onAudioChunk

onAudioChunk?: (chunk: AudioChunk) => void

Per-chunk hook from ScenarioConfig.onAudioChunk.

`Optional`onVoiceEvent

onVoiceEvent?: (event: VoiceEvent) => void

Per-event hook from ScenarioConfig.onVoiceEvent.

voiceAudioCursor

voiceAudioCursor: number | null = null

Byte-accurate audio cursor (seconds) — cumulative PCM byte-duration of all segments laid so far. Drives segment start/end so voiceRecording.duration tracks the full.wav byte-duration, not wall-clock send latency (M1).

`Optional`voiceBackgroundNoise

voiceBackgroundNoise?: { source: string; volume: number }

Background ambience recorded by backgroundNoise(source, volume) — read by the user-simulator audio path when mixing turns (Gap #8).

voiceConfig

voiceConfig: ResolvedVoiceConfig | null = null

Resolved per-run voice config (ADR-002 / Gap #7). Set at run start from cfg.voice when voice adapters are present; the consumer agents read the provider/knobs here instead of a module global.

`Optional`voiceInterruptions

voiceInterruptions?: InterruptionConfig

Interruption config recorded by voiceProceed({ interruptions }). Read at the top of each proceed() iteration to decide barge-ins (Gap #8).

voiceLatency

voiceLatency: LatencyMetrics | null = null

Response-time measurements from agent_start_speaking events.

voiceRecording

voiceRecording: VoiceRecording | null = null

PCM16 segments + timeline accumulated during the run.

voiceRecordingStartedAt

voiceRecordingStartedAt: number | null = null

Monotonic clock anchor (performance.now() / 1000) for offsets.

voiceTimeline

voiceTimeline: VoiceEvent[] | null = null

Mirror of voiceRecording.timeline for direct subscribers.

Accessors

messages

get messages(): ModelMessage[]
Gets the complete conversation history as an array of messages.

Returns ModelMessage[]
Array of ModelMessage objects representing the full conversation
Implementation of ScenarioExecutionLike.messages
- Defined in work/scenario/scenario/javascript/src/execution/scenario-execution.ts:385

result

get result(): ScenarioResult | undefined
Gets the result of the scenario execution if it has been set.

Returns ScenarioResult | undefined
The scenario result or undefined if not yet set
- Defined in work/scenario/scenario/javascript/src/execution/scenario-execution.ts:404

threadId

get threadId(): string
Gets the unique identifier for the conversation thread. This ID is used to maintain conversation context across multiple runs.

Returns string
The thread identifier string
Implementation of ScenarioExecutionLike.threadId
- Defined in work/scenario/scenario/javascript/src/execution/scenario-execution.ts:395

Methods

addAgentTime

addAgentTime(agentIdx: number, time: number): void
Adds execution time for a specific agent to the performance tracking.

This method is used internally to track how long each agent takes to respond, which is included in the final scenario result for performance analysis. The accumulated time for each agent is used to calculate total agent response times in the scenario result.
Parameters
- agentIdx: number
  The index of the agent in the agents array
- time: number
  The execution time in milliseconds to add to the agent's total
Returns void
Example
```
// This is typically called internally by the execution engine
execution.addAgentTime(0, 1500); // Agent at index 0 took 1.5 seconds
```
- Defined in work/scenario/scenario/javascript/src/execution/scenario-execution.ts:2003

agent

agent(content?: string | ModelMessage): Promise<void>
Executes an agent turn in the conversation.

If content is provided, it's used directly as the agent's response. If not provided, the agent under test is called to generate a response based on the current conversation context and any pending messages.

This method is part of the ScenarioExecutionLike interface used by script steps.
Parameters
- Optionalcontent: string | ModelMessage
  Optional content for the agent's response. Can be a string or ModelMessage. If not provided, the agent under test will generate the response.
Returns Promise<void>
Example
```
// Let agent generate response
await execution.agent();

// Use provided content
await execution.agent("The weather is sunny today!");

// Use a ModelMessage object
await execution.agent({
  role: "assistant",
  content: "I'm here to help you with weather information."
});
```
Implementation of ScenarioExecutionLike.agent
- Defined in work/scenario/scenario/javascript/src/execution/scenario-execution.ts:1220

agentNonBlocking

agentNonBlocking(content?: string | ModelMessage): void
Fire an agent turn WITHOUT awaiting it (PRD §4.4 agent({ wait: false })). The in-flight promise is recorded on pendingAgentTask so the next user call can detect it and fire a mid-stream barge-in. Mirrors Python's agent(wait=False) setting _pending_agent_task.

Errors from the background turn are swallowed here (they surface via the recorded segments / the recovery turn) — exactly as the previous void executor.agent().catch() call sites did.
Parameters
- Optionalcontent: string | ModelMessage
Returns void
Implementation of ScenarioExecutionLike.agentNonBlocking
- Defined in work/scenario/scenario/javascript/src/execution/scenario-execution.ts:1234

execute

execute(): Promise<ScenarioResult>
Executes the entire scenario from start to finish.

This method runs through all script steps sequentially until a final result (success, failure, or error) is determined. Each script step can trigger one or more agent interactions depending on the step type:
- user() and agent() steps typically trigger one agent interaction each
- proceed() steps can trigger multiple agent interactions across multiple turns
- judge() steps trigger the judge agent to evaluate the conversation
- succeed() and fail() steps immediately end the scenario
The execution will stop early if:
- A script step returns a ScenarioResult
- The maximum number of turns is reached
- An error occurs during execution
Returns Promise<ScenarioResult>
A promise that resolves with the final result of the scenario
Throws
Error if an unhandled exception occurs during execution
Example
```
const execution = new ScenarioExecution(config, script);
const result = await execution.execute();
console.log(`Scenario ${result.success ? 'passed' : 'failed'}`);
```
- Defined in work/scenario/scenario/javascript/src/execution/scenario-execution.ts:533

fail

fail(reasoning?: string): Promise<ScenarioResult>
Immediately ends the scenario with a failure verdict.

This method forces the scenario to end with failure, regardless of the current conversation state. It's useful for scenarios where you want to explicitly mark failure based on specific conditions or external factors.

This method is part of the ScenarioExecutionLike interface used by script steps.
Parameters
- Optionalreasoning: string
  Optional explanation for why the scenario is being marked as failed
Returns Promise<ScenarioResult>
A promise that resolves with the final failed scenario result
Example
```
// Mark failure with default reasoning
const result = await execution.fail();

// Mark failure with custom reasoning
const result = await execution.fail(
  "Agent failed to provide accurate weather information"
);
```
Implementation of ScenarioExecutionLike.fail
- Defined in work/scenario/scenario/javascript/src/execution/scenario-execution.ts:1977

judge

judge(
options?: { context?: string; criteria?: string[] },
): Promise<ScenarioResult | null>
Invokes the judge agent to evaluate the current state of the conversation.

The judge agent analyzes the conversation history and determines whether the scenario criteria have been met. This can result in either:
- A final scenario result (success/failure) if the judge makes a decision
- Null if the judge needs more information or conversation to continue
This method is part of the ScenarioExecutionLike interface used by script steps.
Parameters
- Optionaloptions: { context?: string; criteria?: string[] }
  Optional options with inline criteria to evaluate as a checkpoint.
Returns Promise<ScenarioResult | null>
A promise that resolves with:
- ScenarioResult if the judge makes a final decision, or
- Null if the conversation should continue
Example
```
// Let judge evaluate with its configured criteria
const result = await execution.judge();

// Evaluate inline criteria as a checkpoint
const result = await execution.judge({ criteria: ["Agent responded helpfully"] });

// Provide additional context for tool-call-heavy conversations
const result = await execution.judge({
  criteria: ["Agent installed the dependency"],
  context: "The agent ran `npm install -g git-orchard` which exited 0.",
});
```
Implementation of ScenarioExecutionLike.judge
- Defined in work/scenario/scenario/javascript/src/execution/scenario-execution.ts:1284

message

message(message: ModelMessage): Promise<void>
Adds a message to the conversation history.

This method is part of the ScenarioExecutionLike interface used by script steps. It automatically routes the message to the appropriate agent based on the message role:
- "user" messages are routed to USER role agents
- "assistant" messages are routed to AGENT role agents
- Other message types are added directly to the conversation
Parameters
- message: ModelMessage
  The ModelMessage to add to the conversation
Returns Promise<void>
Example
```
await execution.message({
  role: "user",
  content: "Hello, how are you?"
});
```
Implementation of ScenarioExecutionLike.message
- Defined in work/scenario/scenario/javascript/src/execution/scenario-execution.ts:1069

proceed

proceed(
    turns?: number,
    onTurn?: (state: ScenarioExecutionStateLike) => void | Promise<void>,
    onStep?: (state: ScenarioExecutionStateLike) => void | Promise<void>,
): Promise<ScenarioResult | null>
Lets the scenario proceed automatically for a specified number of turns.

This method is a script step that simulates natural conversation flow by allowing agents to interact automatically without explicit script steps. It can trigger multiple agent interactions across multiple turns, making it useful for testing scenarios where you want to see how agents behave in extended conversations.

Unlike other script steps that typically trigger one agent interaction each, this step can trigger many agent interactions depending on the number of turns and the agents' behavior.

The method will continue until:
- The specified number of turns is reached
- A final scenario result is determined
- The maximum turns limit is reached
Parameters
- Optionalturns: number
  The number of turns to proceed. If undefined, runs until a conclusion or max turns is reached
- OptionalonTurn: (state: ScenarioExecutionStateLike) => void | Promise<void>
  Optional callback executed at the end of each turn. Receives the current execution state
- OptionalonStep: (state: ScenarioExecutionStateLike) => void | Promise<void>
  Optional callback executed after each agent interaction. Receives the current execution state
Returns Promise<ScenarioResult | null>
A promise that resolves with:
- ScenarioResult if a conclusion is reached during the proceeding, or
- Null if the specified turns complete without conclusion
Example
```
// Proceed for 5 turns
const result = await execution.proceed(5);

// Proceed until conclusion with callbacks
const result = await execution.proceed(
  undefined,
  (state) => console.log(`Turn ${state.currentTurn} completed`),
  (state) => console.log(`Agent interaction completed, ${state.messages.length} messages`)
);
```
Implementation of ScenarioExecutionLike.proceed
- Defined in work/scenario/scenario/javascript/src/execution/scenario-execution.ts:1332

step

step(): Promise<void>
Executes a single agent interaction in the scenario.

This method is for manual step-by-step execution of the scenario, where each call represents one agent taking their turn. This is different from script steps (like user(), agent(), proceed(), etc.) which are functions in the scenario script.

Each call to this method will:
- Progress to the next turn if needed
- Find the next agent that should act
- Execute that agent's response
- Set the result if the scenario concludes
Note: This method is primarily for debugging or custom execution flows. Most users will use execute() to run the entire scenario automatically.

After calling this method, check this.result to see if the scenario has concluded.
Returns Promise<void>
Example
```
const execution = new ScenarioExecution(config, script);

// Execute one agent interaction at a time
await execution.step();
if (execution.result) {
  console.log('Scenario finished:', execution.result.success);
}
```
- Defined in work/scenario/scenario/javascript/src/execution/scenario-execution.ts:808

succeed

succeed(reasoning?: string): Promise<ScenarioResult>
Immediately ends the scenario with a success verdict.

This method forces the scenario to end successfully, regardless of the current conversation state. It's useful for scenarios where you want to explicitly mark success based on specific conditions or external factors.

This method is part of the ScenarioExecutionLike interface used by script steps.
Parameters
- Optionalreasoning: string
  Optional explanation for why the scenario is being marked as successful
Returns Promise<ScenarioResult>
A promise that resolves with the final successful scenario result
Example
```
// Mark success with default reasoning
const result = await execution.succeed();

// Mark success with custom reasoning
const result = await execution.succeed(
  "User successfully completed the onboarding flow"
);
```
Implementation of ScenarioExecutionLike.succeed
- Defined in work/scenario/scenario/javascript/src/execution/scenario-execution.ts:1944

user

user(content?: string | ModelMessage): Promise<void>
Executes a user turn in the conversation.

If content is provided, it's used directly as the user's message. If not provided, the user simulator agent is called to generate an appropriate response based on the current conversation context.

This method is part of the ScenarioExecutionLike interface used by script steps.
Parameters
- Optionalcontent: string | ModelMessage
  Optional content for the user's message. Can be a string or ModelMessage. If not provided, the user simulator agent will generate the content.
Returns Promise<void>
Example
```
// Use provided content
await execution.user("What's the weather like?");

// Let user simulator generate content
await execution.user();

// Use a ModelMessage object
await execution.user({
  role: "user",
  content: "Tell me a joke"
});
```
Implementation of ScenarioExecutionLike.user
- Defined in work/scenario/scenario/javascript/src/execution/scenario-execution.ts:1107

Class ScenarioExecution

Execution Flow Overview

Message Broadcasting System

Example Message Flow

Example

Implements

Index

Constructors

Properties

Accessors

Methods

Constructors

constructor

Parameters

Returns ScenarioExecution

Properties

OptionalaudioPlaybackSink

Readonlyevents$

Optional InternalinterruptBargeInDelayMs

Optional InternalinterruptOverrides

Optional InternalinterruptWaitForSpeechMs

OptionalonAudioChunk

OptionalonVoiceEvent

voiceAudioCursor

OptionalvoiceBackgroundNoise

voiceConfig

OptionalvoiceInterruptions

voiceLatency

voiceRecording

voiceRecordingStartedAt

voiceTimeline

Accessors

messages

Returns ModelMessage[]

result

Returns ScenarioResult | undefined

threadId

Returns string

Methods

addAgentTime

Parameters

Returns void

Example

agent

Parameters

Returns Promise<void>

Example

agentNonBlocking

Parameters

Returns void

execute

Returns Promise<ScenarioResult>

Throws

Example

fail

Parameters

Returns Promise<ScenarioResult>

Example

judge

Parameters

Returns Promise<ScenarioResult | null>

Example

message

Parameters

Returns Promise<void>

Example

proceed

Parameters

Returns Promise<ScenarioResult | null>

Example

step

Returns Promise<void>

Example

succeed

Parameters

Returns Promise<ScenarioResult>

Example

user

Parameters

Returns Promise<void>

`Optional`audioPlaybackSink

`Readonly`events$

`Optional` `Internal`interruptBargeInDelayMs

`Optional` `Internal`interruptOverrides

`Optional` `Internal`interruptWaitForSpeechMs

`Optional`onAudioChunk

`Optional`onVoiceEvent

`Optional`voiceBackgroundNoise

`Optional`voiceInterruptions