OptionalneedsOptionalphaseDescribe what getPhaseName actually returns.
"staged" — phases carry semantic meaning (e.g. Crescendo's
warmup / probing / escalation / direct) and are emitted
as red_team.phase in telemetry.
"progress" — the label is a coarse progress bucket with no
semantic meaning (e.g. GOAT's early / mid / late) and is
emitted as red_team.progress_bucket so dashboards don't mistake
it for a staged-strategy phase.
Defaults to "staged" when omitted (backward-compatible).
Build a turn-aware system prompt for the attacker.
Score feedback, adaptation hints, and backtrack markers are communicated via the attacker's private conversation history (H_attacker) as system messages — not embedded in this prompt.
OptionalchosenExtract typed technique identifiers from the attacker's strategy
field for telemetry. Strategies that define a technique catalogue
override this to return the IDs of techniques actually used on a given
turn — powering the red_team.chosen_technique_ids span attribute.
Default (omitted) contributes nothing.
Turn the attacker LLM's raw output into an AttackerOutput.
Strategies like Crescendo (no JSON contract) return
{reply: raw, observation: "", strategy: "", parseFailed: false}.
Strategies like GOAT that instruct the attacker to emit structured
output override this to parse the JSON and populate observation /
strategy, setting parseFailed if the output was malformed.
OptionalphaseReturn phase boundary turn numbers to inject into the metaprompt template.
Override this to inject strategy-specific template variables. Strategies
that don't need extra template vars (e.g. GOAT) can omit this method —
the orchestrator treats undefined as "no extra vars".
Whether this strategy needs a pre-generated attack plan via the metaprompt LLM call.
Crescendo-style staged strategies depend on one; GOAT (paper fidelity) does not — the attacker reasons turn-by-turn from catalogue + history. When
false, the orchestrator skips_generateAttackPlanand passes an empty string asmetapromptPlantobuildSystemPrompt.Defaults to
truewhen omitted (backward-compatible).