after: <seconds> — TIME-based. Let the agent speak for N seconds, then
interrupt. The exact interrupt(after=2.0, content) form.
afterWords: N — wait until the agent's streaming transcript has emitted
N words. Requires capabilities.streamingTranscripts; raises
UnsupportedCapabilityError otherwise.
neither — a bounded wait for the agent to start speaking, then
interrupt at the first chunk.
The wait matters most on transports without a client-side cancel signal:
the interrupt must overlap real agent audio for the server's VAD to fire.
Without it, user TTS would finish generating in ~600ms while the model
still hasn't started speaking — the "interrupt" lands during silence and
transports nothing for the bot to barge against.
content routing:
string that does NOT end with an audio extension → user text (TTS).
string ending with .wav/.mp3/.ogg/.flac → audio file.
Uint8Array → raw audio bytes (routed through audio).
Declarative interruption step. Equivalent to:
Three trigger modes (PRD §4.4, layered):
after: <seconds>— TIME-based. Let the agent speak for N seconds, then interrupt. The exactinterrupt(after=2.0, content)form.afterWords: N— wait until the agent's streaming transcript has emitted N words. Requirescapabilities.streamingTranscripts; raises UnsupportedCapabilityError otherwise.The wait matters most on transports without a client-side cancel signal: the interrupt must overlap real agent audio for the server's VAD to fire. Without it, user TTS would finish generating in ~600ms while the model still hasn't started speaking — the "interrupt" lands during silence and transports nothing for the bot to barge against.
contentrouting:stringthat does NOT end with an audio extension → user text (TTS).stringending with.wav/.mp3/.ogg/.flac→ audio file.Uint8Array→ raw audio bytes (routed through audio).