TOOL-003 — Sales Play Composer

Tier 3 Specialist Tool · Stateless · Composes structured sales play definitions from a hypothesis + Tier 1 brain-ready views · Closes Domain 1's "create plays, not just execute" gap

Tier 3 · Tool Specced · v29 Domain 1 · Pipeline Generation Sonnet

Purpose

Given a hypothesis (e.g., "MM healthcare under-covered, vertical-specific play needed") and the relevant Tier 1 brain-ready views, TOOL-003 composes a fully structured play definition: target_definition, suggested_cadence, success_criteria. Output lands in SalesPlayLibrary with state draft, ready for human pickup into the co-definition workflow. This is the lever that turns a brain proposal from "we should do something here" into "here's the candidate play, refine and approve."

Closes the Domain 1 gap from v26 eval ("create plays, not just execute"). The Brain Agents identify gaps and frame hypotheses; TOOL-003 turns hypotheses into structured drafts that match SalesPlayLibrary's schema and AGT-302's execution contract.

Input schema

{
  "hypothesis": "string",                  // 2-3 sentence thesis from the calling brain
  "scope": "segment" | "account_specific",
  "segment": "string",                     // required when scope=segment
  "account_id": "uuid",                    // required when scope=account_specific
  "context_views": {                       // brain-ready view snapshots from the caller
    "icp_profile_summary": "string",       // current ICP profile for the segment
    "win_loss_pattern": "string",          // trailing pattern from WinLossLog
    "account_priority_distribution": "string",
    "current_active_plays": [               // existing plays to avoid duplication
      { "play_id": "uuid", "name": "string", "hypothesis": "string" }
    ],
    "voc_themes": ["string"]                // optional VoC themes if relevant
  },
  "constraints": {
    "must_use_existing_lever_only": true,  // hard: no inventing new agent capabilities
    "max_touch_count": 10,                 // hard: cadence cannot exceed AGT-302's existing limits
    "anchor_to_tool_001_output": "uuid"    // optional — chain from a TOOL-001 candidate
  },
  "originating_proposal_id": "uuid"        // from calling brain's BrainAnalysisLog
}

The must_use_existing_lever_only = true constraint is non-negotiable. TOOL-003 never proposes a play that requires new agent capabilities, new schema, or new approval flows. Plays must compose from existing AGT-201 / 203 / 301 / 302 / 303 / 305 levers.

Output schema

{
  "tool_call_id": "uuid",
  "draft_play": {
    "name": "string",                      // human-readable play name (under 80 chars)
    "hypothesis": "string",                // refined from input, may be edited
    "scope": "segment" | "account_specific",
    "segment": "string",                   // populated per scope
    "account_id": "uuid",                  // populated per scope
    "target_definition": {
      "icp_signals": ["string"],
      "lifecycle_stage_fit": "string",
      "exclusion_criteria": ["string"]     // who's out of scope despite matching primary
    },
    "suggested_cadence": {
      "cadence_type": "abm" | "standard" | "nurture" | "vertical-specific",
      "channel_mix": ["email","linkedin","phone","event","content"],
      "touch_count": 0,                    // bounded by max_touch_count
      "step_outline": [                    // ordered cadence outline
        { "step": 1, "channel": "string", "trigger": "string", "asset_needed": "string" }
      ],
      "agt_302_template_reference": "string" // existing template to extend, if applicable
    },
    "success_criteria": {
      "primary_metric": "string",          // single primary success metric
      "primary_metric_target": "string",
      "secondary_metrics": ["string"],
      "qualifying_signal": "string",       // what tells us early it's working
      "retire_signal": "string"            // what would cause us to retire it
    },
    "originating_proposal_id": "uuid"      // inherited from input
  },
  "lever_grounding": [                      // every play element must trace to existing levers
    { "play_element": "string", "agent_lever": "string", "spec_section": "string" }
  ],
  "duplication_check": {
    "checked_against_count": 0,            // active plays compared
    "overlap_flagged": false,
    "overlapping_play_id": "uuid",         // if overlap_flagged
    "differentiation_rationale": "string"  // if overlap exists, why this is still distinct
  },
  "ungrounded_assumptions": ["string"],     // explicit, parallels TOOL-001 and TOOL-002
  "self_assessed_promotion_likelihood": "high" | "medium" | "low",
  "tool_metadata": {
    "model": "claude-sonnet-4-6",
    "input_tokens": 0, "output_tokens": 0,
    "cost_usd_estimate": 0.0,
    "latency_ms": 0
  }
}

Hard rule: lever_grounding must cover every element of the play. No element of the suggested cadence, target definition, or success criteria can exist without a citation to an existing agent's spec section. If the tool can't ground an element, it omits the element rather than fabricating it.

Called by

Caller	Invocation context
AGT-901 Pipeline Brain	Most common caller. Brain identifies a coverage gap or play hypothesis, calls TOOL-003 to compose the structured play definition, draft lands in SalesPlayLibrary inheriting the brain's `proposal_id`.
AGT-902 Account Brain	For account-specific plays (scope=account_specific). Brain identifies an account-level expansion or retention play, calls TOOL-003 to compose the structured definition.
Chained from TOOL-001	When a brain runs the full chain: TOOL-001 reads API docs and proposes 1-3 candidate plays at outline level → brain selects the most promising → calls TOOL-003 with the candidate as `anchor_to_tool_001_output` to fully compose it.
RevOps direct (workspace UI)	RevOps writes a hypothesis manually, calls TOOL-003 with explicit context views. Used when leadership wants to pressure-test a hypothesis they generated, not the brain.

Design principles

Compose, don't invent. Every cadence step, every target signal, every success metric must exist in the existing service vocabulary. If the play requires a capability AGT-302 doesn't support, the tool either omits that element or returns the play with explicit "requires_new_capability" flag (which causes the calling brain to route to recommend_human_query instead of drafting).
Honor the volume cap upstream. SalesPlayLibrary enforces the volume cap on activation, not draft. TOOL-003 may produce drafts that would exceed the cap if all activated; that's by design — the workspace lets humans pick the best subset.
Refuse over-specification. The output is a draft for co-definition, not a finished play. The tool intentionally leaves step_outline at outline-level rather than full-asset content; humans add the polish in the workspace.
Self-assessed promotion likelihood is honest, not optimistic. The tool flags drafts as "low" likelihood when the hypothesis is exploratory or the lever grounding is thin. Calibrating the brain to draft fewer-but-better plays is preferable to draft inflation.
De-duplicate at draft time. The current_active_plays input is checked; overlapping drafts are flagged with a differentiation rationale so RevOps can choose to retire the original or take the new one as a replacement.

Cost ceiling

Constraint	Value
Per-call input budget	30K tokens (hypothesis + multiple brain-ready view summaries + active plays context)
Per-call output budget	4K tokens (structured play + lever grounding + metadata)
Default model	Sonnet — composition task with structural constraints; Haiku tested but lever-grounding accuracy below threshold
Per-call cost estimate	~$0.15 per call at Sonnet pricing
Monthly cap (default)	$300/mo — bounds usage to ~2,000 calls/month
Frequency expectation	Moderate — brain-driven invocations during planning cycles + ad-hoc draft compositions during operational queries.

Eval criteria

Criterion	Measurement	Pass threshold
Schema compliance	Output validates against SalesPlayLibrary draft schema	100% (hard)
Lever grounding completeness	% of play elements with valid `lever_grounding` citation	100% (hard) — ungrounded elements should be omitted, not fabricated
Citation validity	% of `lever_grounding` citations that point to a real agent spec section (manual reviewer check against current spec set)	≥ 95%
Promotion rate (operational)	% of TOOL-003-generated drafts that survive co-definition to `active` in SalesPlayLibrary	≥ 30% (matches the AGT-901 eval threshold; TOOL-003 is the mechanism by which AGT-901's threshold gets met)
Promotion likelihood calibration	Drafts the tool self-rated "high" vs "medium" vs "low" should differ in actual promotion rate by > 20pp between tiers	Calibration check; if ratings don't predict outcomes, ratings are noise
Duplication detection	% of overlapping drafts correctly flagged in `duplication_check` when active plays cover similar ground	≥ 90%
P95 latency	End-to-end tool call	≤ 6s

Failure modes

Symptom	Cause	Action
Tool drafts plays that require capabilities AGT-302 doesn't support	Lever-grounding constraint not enforced strongly enough	Hard fail in eval. Tighten prompt with explicit list of supported AGT-302 capabilities. Tool should return play with "requires_new_capability" flag rather than fabricate.
Drafts uniformly self-rated "high" likelihood, but actual promotion rate is much lower	Optimism bias in self-assessment	Recalibrate via eval. Add few-shot examples of plays that the brain rated "high" but failed co-definition, so the tool learns to be more discriminating.
High volume of drafts, low promotion rate	Brain (caller) producing too many hypotheses; tool doing its job by drafting them all	Issue is upstream, not in TOOL-003. AGT-901 calibration should produce fewer hypotheses; tool will draft fewer plays as a result.
Drafts duplicate active plays despite duplication check	Active play context not being passed in input	Caller responsibility. Check `current_active_plays` is populated at invocation; if the field is empty, tool can't deduplicate.
Cost spike	Brain invoking TOOL-003 many times during a single planning session	Enable prompt caching at workspace level. Per-call cap blocks runaway. Audit BrainAnalysisLog for invocation patterns.

Interaction with SalesPlayLibrary

TOOL-003 output is mapped into the SalesPlayLibrary schema by the calling brain. The brain writes the draft row with state=draft; the brain does not write directly through TOOL-003. This preserves the architectural invariant that the tool is stateless. The brain owns the write because the brain owns the BrainAnalysisLog row that lineage-anchors the draft.

Promotion gate is unchanged: draft → under_review requires human pickup, under_review → active requires SLM + RevOps joint approval, hard volume cap enforced at activation. TOOL-003 has zero role in promotion — it only produces the initial draft.