TOOL-003 — Sales Play Composer
Tier 3 Specialist Tool · Stateless · Composes structured sales play definitions from a hypothesis + Tier 1 brain-ready views · Closes Domain 1's "create plays, not just execute" gap
Tier 3 · Tool
Specced · v29
Domain 1 · Pipeline Generation
Sonnet
Purpose
Given a hypothesis (e.g., "MM healthcare under-covered, vertical-specific play needed") and the relevant Tier 1 brain-ready views, TOOL-003 composes a fully structured play definition: target_definition, suggested_cadence, success_criteria. Output lands in SalesPlayLibrary with state draft, ready for human pickup into the co-definition workflow. This is the lever that turns a brain proposal from "we should do something here" into "here's the candidate play, refine and approve."
Closes the Domain 1 gap from v26 eval ("create plays, not just execute"). The Brain Agents identify gaps and frame hypotheses; TOOL-003 turns hypotheses into structured drafts that match SalesPlayLibrary's schema and AGT-302's execution contract.
Input schema
{
"hypothesis": "string", // 2-3 sentence thesis from the calling brain
"scope": "segment" | "account_specific",
"segment": "string", // required when scope=segment
"account_id": "uuid", // required when scope=account_specific
"context_views": { // brain-ready view snapshots from the caller
"icp_profile_summary": "string", // current ICP profile for the segment
"win_loss_pattern": "string", // trailing pattern from WinLossLog
"account_priority_distribution": "string",
"current_active_plays": [ // existing plays to avoid duplication
{ "play_id": "uuid", "name": "string", "hypothesis": "string" }
],
"voc_themes": ["string"] // optional VoC themes if relevant
},
"constraints": {
"must_use_existing_lever_only": true, // hard: no inventing new agent capabilities
"max_touch_count": 10, // hard: cadence cannot exceed AGT-302's existing limits
"anchor_to_tool_001_output": "uuid" // optional — chain from a TOOL-001 candidate
},
"originating_proposal_id": "uuid" // from calling brain's BrainAnalysisLog
}
The must_use_existing_lever_only = true constraint is non-negotiable. TOOL-003 never proposes a play that requires new agent capabilities, new schema, or new approval flows. Plays must compose from existing AGT-201 / 203 / 301 / 302 / 303 / 305 levers.
Output schema
{
"tool_call_id": "uuid",
"draft_play": {
"name": "string", // human-readable play name (under 80 chars)
"hypothesis": "string", // refined from input, may be edited
"scope": "segment" | "account_specific",
"segment": "string", // populated per scope
"account_id": "uuid", // populated per scope
"target_definition": {
"icp_signals": ["string"],
"lifecycle_stage_fit": "string",
"exclusion_criteria": ["string"] // who's out of scope despite matching primary
},
"suggested_cadence": {
"cadence_type": "abm" | "standard" | "nurture" | "vertical-specific",
"channel_mix": ["email","linkedin","phone","event","content"],
"touch_count": 0, // bounded by max_touch_count
"step_outline": [ // ordered cadence outline
{ "step": 1, "channel": "string", "trigger": "string", "asset_needed": "string" }
],
"agt_302_template_reference": "string" // existing template to extend, if applicable
},
"success_criteria": {
"primary_metric": "string", // single primary success metric
"primary_metric_target": "string",
"secondary_metrics": ["string"],
"qualifying_signal": "string", // what tells us early it's working
"retire_signal": "string" // what would cause us to retire it
},
"originating_proposal_id": "uuid" // inherited from input
},
"lever_grounding": [ // every play element must trace to existing levers
{ "play_element": "string", "agent_lever": "string", "spec_section": "string" }
],
"duplication_check": {
"checked_against_count": 0, // active plays compared
"overlap_flagged": false,
"overlapping_play_id": "uuid", // if overlap_flagged
"differentiation_rationale": "string" // if overlap exists, why this is still distinct
},
"ungrounded_assumptions": ["string"], // explicit, parallels TOOL-001 and TOOL-002
"self_assessed_promotion_likelihood": "high" | "medium" | "low",
"tool_metadata": {
"model": "claude-sonnet-4-6",
"input_tokens": 0, "output_tokens": 0,
"cost_usd_estimate": 0.0,
"latency_ms": 0
}
}
Hard rule: lever_grounding must cover every element of the play. No element of the suggested cadence, target definition, or success criteria can exist without a citation to an existing agent's spec section. If the tool can't ground an element, it omits the element rather than fabricating it.
Called by
| Caller | Invocation context |
| AGT-901 Pipeline Brain | Most common caller. Brain identifies a coverage gap or play hypothesis, calls TOOL-003 to compose the structured play definition, draft lands in SalesPlayLibrary inheriting the brain's proposal_id. |
| AGT-902 Account Brain | For account-specific plays (scope=account_specific). Brain identifies an account-level expansion or retention play, calls TOOL-003 to compose the structured definition. |
| Chained from TOOL-001 | When a brain runs the full chain: TOOL-001 reads API docs and proposes 1-3 candidate plays at outline level → brain selects the most promising → calls TOOL-003 with the candidate as anchor_to_tool_001_output to fully compose it. |
| RevOps direct (workspace UI) | RevOps writes a hypothesis manually, calls TOOL-003 with explicit context views. Used when leadership wants to pressure-test a hypothesis they generated, not the brain. |
Design principles
- Compose, don't invent. Every cadence step, every target signal, every success metric must exist in the existing service vocabulary. If the play requires a capability AGT-302 doesn't support, the tool either omits that element or returns the play with explicit "requires_new_capability" flag (which causes the calling brain to route to
recommend_human_query instead of drafting).
- Honor the volume cap upstream. SalesPlayLibrary enforces the volume cap on activation, not draft. TOOL-003 may produce drafts that would exceed the cap if all activated; that's by design — the workspace lets humans pick the best subset.
- Refuse over-specification. The output is a draft for co-definition, not a finished play. The tool intentionally leaves
step_outline at outline-level rather than full-asset content; humans add the polish in the workspace.
- Self-assessed promotion likelihood is honest, not optimistic. The tool flags drafts as "low" likelihood when the hypothesis is exploratory or the lever grounding is thin. Calibrating the brain to draft fewer-but-better plays is preferable to draft inflation.
- De-duplicate at draft time. The
current_active_plays input is checked; overlapping drafts are flagged with a differentiation rationale so RevOps can choose to retire the original or take the new one as a replacement.
Cost ceiling
| Constraint | Value |
| Per-call input budget | 30K tokens (hypothesis + multiple brain-ready view summaries + active plays context) |
| Per-call output budget | 4K tokens (structured play + lever grounding + metadata) |
| Default model | Sonnet — composition task with structural constraints; Haiku tested but lever-grounding accuracy below threshold |
| Per-call cost estimate | ~$0.15 per call at Sonnet pricing |
| Monthly cap (default) | $300/mo — bounds usage to ~2,000 calls/month |
| Frequency expectation | Moderate — brain-driven invocations during planning cycles + ad-hoc draft compositions during operational queries. |
Eval criteria
| Criterion | Measurement | Pass threshold |
| Schema compliance | Output validates against SalesPlayLibrary draft schema | 100% (hard) |
| Lever grounding completeness | % of play elements with valid lever_grounding citation | 100% (hard) — ungrounded elements should be omitted, not fabricated |
| Citation validity | % of lever_grounding citations that point to a real agent spec section (manual reviewer check against current spec set) | ≥ 95% |
| Promotion rate (operational) | % of TOOL-003-generated drafts that survive co-definition to active in SalesPlayLibrary | ≥ 30% (matches the AGT-901 eval threshold; TOOL-003 is the mechanism by which AGT-901's threshold gets met) |
| Promotion likelihood calibration | Drafts the tool self-rated "high" vs "medium" vs "low" should differ in actual promotion rate by > 20pp between tiers | Calibration check; if ratings don't predict outcomes, ratings are noise |
| Duplication detection | % of overlapping drafts correctly flagged in duplication_check when active plays cover similar ground | ≥ 90% |
| P95 latency | End-to-end tool call | ≤ 6s |
Failure modes
| Symptom | Cause | Action |
| Tool drafts plays that require capabilities AGT-302 doesn't support | Lever-grounding constraint not enforced strongly enough | Hard fail in eval. Tighten prompt with explicit list of supported AGT-302 capabilities. Tool should return play with "requires_new_capability" flag rather than fabricate. |
| Drafts uniformly self-rated "high" likelihood, but actual promotion rate is much lower | Optimism bias in self-assessment | Recalibrate via eval. Add few-shot examples of plays that the brain rated "high" but failed co-definition, so the tool learns to be more discriminating. |
| High volume of drafts, low promotion rate | Brain (caller) producing too many hypotheses; tool doing its job by drafting them all | Issue is upstream, not in TOOL-003. AGT-901 calibration should produce fewer hypotheses; tool will draft fewer plays as a result. |
| Drafts duplicate active plays despite duplication check | Active play context not being passed in input | Caller responsibility. Check current_active_plays is populated at invocation; if the field is empty, tool can't deduplicate. |
| Cost spike | Brain invoking TOOL-003 many times during a single planning session | Enable prompt caching at workspace level. Per-call cap blocks runaway. Audit BrainAnalysisLog for invocation patterns. |
Interaction with SalesPlayLibrary
TOOL-003 output is mapped into the SalesPlayLibrary schema by the calling brain. The brain writes the draft row with state=draft; the brain does not write directly through TOOL-003. This preserves the architectural invariant that the tool is stateless. The brain owns the write because the brain owns the BrainAnalysisLog row that lineage-anchors the draft.
Promotion gate is unchanged: draft → under_review requires human pickup, under_review → active requires SLM + RevOps joint approval, hard volume cap enforced at activation. TOOL-003 has zero role in promotion — it only produces the initial draft.