TOOL-001 — API-doc → Sales-play Translator

Tier 3 Specialist Tool · Stateless · Reads API documentation, produces sales play candidates for technical buyers · Closes Domain 3 gap from v26 architecture eval

Tier 3 · Tool Specced · v29 Domain 3 · API/Dev GTM Sonnet

Purpose

Reads a product's API documentation and produces 1–3 candidate sales play definitions for technical buyer personas. The output lands in SalesPlayLibrary with state draft; humans pick up from there to co-define and approve. The tool's job is to translate API surface area into buyer-relevant language — not to invent positioning, just to surface the use cases the API enables and frame them in a way a sales team can actually run with.

Closes the Domain 3 gap (20% coverage in v26 eval). Today no service or agent can read API docs and produce plays for technical buyers; this is an LLM-shaped task that no spec-driven function can encode.

Input schema

{
  "api_doc_input": {
    "type": "openapi_spec" | "markdown_url" | "raw_markdown",
    "content": "...",                      // OpenAPI JSON, URL, or raw markdown
    "doc_version": "string",               // e.g., "v2.4.0"
    "doc_publication_date": "ISO 8601"
  },
  "context": {
    "product_family": "string",            // e.g., "background-checks", "kyc", "verifications"
    "current_icp_summary": "string",       // 2-3 sentence summary of current ICP from AGT-201
    "current_active_plays": [               // current play context to avoid duplication
      { "play_id": "uuid", "name": "string", "segment": "string" }
    ],
    "target_buyer_persona_hint": "string"  // optional — "developer", "platform_team", "compliance_engineer", etc.
  },
  "constraints": {
    "max_plays_to_propose": 3,             // hard cap; tool never returns more
    "include_segment": "string"            // optional — restrict proposals to one segment
  }
}

Input is validated by the calling agent before invocation. Malformed input results in tool rejection with a structured error response — the tool never silently coerces.

Output schema

{
  "tool_call_id": "uuid",
  "candidate_plays": [
    {
      "name": "string",                    // human-readable play name
      "hypothesis": "string",              // 2-3 sentence thesis
      "target_buyer_persona": "string",    // dev / platform / compliance / etc.
      "api_capabilities_referenced": [      // which endpoints/capabilities the play depends on
        { "capability": "string", "doc_section": "string" }
      ],
      "target_definition": {
        "icp_signals": ["string"],         // observable signals: tech stack, job postings, etc.
        "lifecycle_stage_fit": "string"    // pre-trial, post-trial-stalled, mid-implementation, etc.
      },
      "suggested_cadence_outline": {        // not finished cadence — outline only
        "channel_mix": ["email", "linkedin", "developer_event"],
        "touch_count_estimate": 6,
        "notable_assets_needed": ["string"]
      },
      "success_criteria_outline": {
        "primary_metric": "string",        // e.g., "API key activation within 30 days"
        "qualifying_signal": "string"      // what would tell us the play is working
      },
      "confidence_self_rating": "high" | "medium" | "exploratory",
      "ungrounded_assumptions": ["string"] // explicit list of assumptions NOT backed by API docs
    }
  ],
  "input_doc_summary": "string",           // 2-sentence summary of what the docs cover
  "capabilities_not_translated": ["string"], // capabilities the tool noticed but didn't propose plays for, with reasons
  "tool_metadata": {
    "model": "claude-sonnet-4-6",
    "input_tokens": 0,
    "output_tokens": 0,
    "cost_usd_estimate": 0.0,
    "latency_ms": 0
  }
}

Hard rule: ungrounded_assumptions must be populated for every candidate play. The tool cannot claim a play is grounded in API docs when its real basis is general market knowledge. This separation is what makes the output usable downstream — humans can trust the API-grounded parts and treat the assumption parts as starting hypotheses.

Called by

Caller	Invocation context
AGT-901 Pipeline Brain	"What plays does the new product API enable?" — usually after a product launch or major API surface change. Brain calls TOOL-001, then optionally chains to TOOL-003 (Sales Play Composer) to refine the most promising candidate into a fully structured SalesPlayLibrary draft.
AGT-902 Account Brain	"For Account X (uses our API heavily, just upgraded their tech stack), what API-anchored plays could land?" — account-specific variant. Less common than AGT-901 invocation but supported.
RevOps direct (workspace UI)	RevOps drops a new product API doc into the workspace, calls TOOL-001 with constraints, reviews the candidates. Direct invocation is supported and logged the same way as agent-mediated calls.

Prompt design principles

Ground in the docs, separate the assumptions. The tool's prompt explicitly instructs separation between API-grounded claims and market/positioning assumptions. The output's ungrounded_assumptions field is non-negotiable.
Buyer language, not feature language. Output describes what the buyer can do, not what the API endpoint does. "Verify identities for high-volume gig platform onboarding" not "POST /verifications endpoint accepts batch input."
Refuse if docs are too thin. If the input API documentation doesn't contain enough material to ground at least one play, the tool returns 0 candidates with a structured "insufficient_input_signal" reason rather than fabricating plays.
Don't propose what already exists. The current_active_plays input is checked against; the tool flags overlapping proposals and de-duplicates against existing active set.

Cost ceiling

Constraint	Value
Per-call input budget	50K tokens (API doc may be substantial; OpenAPI specs can be 30K+ tokens)
Per-call output budget	5K tokens (candidate plays + metadata)
Default model	Sonnet — synthesis-heavy task; Haiku tested but quality below acceptable
Per-call cost estimate	~$0.20–$0.30 per call at Sonnet pricing
Monthly cap (default)	$300/mo — bounds usage to ~1,000 calls/month
Frequency expectation	Low — product launches and major API changes are infrequent. Most months will see < 50 calls.

Eval criteria

Criterion	Measurement	Pass threshold
Schema compliance	Output validates against output schema	100% (hard)
API grounding	For each candidate play, % of `api_capabilities_referenced` items that map to a real endpoint/capability in the input docs (manual reviewer check)	≥ 95%
Assumption disclosure	% of plays where `ungrounded_assumptions` is non-empty when the play extends beyond API docs (manual reviewer check)	100% (hard) when extension exists
Hallucinated capability rate	% of plays referencing API capabilities the docs don't actually have	0% (hard) — any hallucinated capability is automatic eval fail
Promotion rate (operational)	% of TOOL-001-generated drafts that survive co-definition to `active`	≥ 25%
P95 latency	End-to-end tool call	≤ 8s

Eval suite: 8 retrospective scenarios — 4 historical product launches where we know retrospectively which plays worked, 2 API documentation samples with known capability gaps, 2 edge cases (very thin docs, very deep docs). Scored alongside the brain harness on the same cadence.

Failure modes

Symptom	Cause	Action
Tool fabricates a capability the API doesn't have	Model hallucination on thin docs, or general market knowledge bleed	Hard fail in eval. Tighten prompt to explicitly require capability citations against doc sections. If chronic, hold on Sonnet; Opus tested as fallback.
Tool returns 0 plays consistently	Refusing too aggressively, or input docs systematically too thin	Audit input docs vs. refusal reasons. If refusal is correct, the gap is in product docs, not the tool. Otherwise tune refusal threshold.
Output plays look plausible but never get promoted	Plays grounded in capabilities but disconnected from real buyer pain	Tune `context.current_icp_summary` input to give the tool stronger ICP grounding. Refresh prompt with examples of plays that did get promoted.
P95 latency creep	Input docs growing in size; tool processing larger contexts	Implement input chunking strategy — summarize large OpenAPI specs to relevant subsections before invocation. Budget cap blocks runaway calls.
Cost spiking	Operator running it repeatedly during exploratory sessions, no caching	Enable prompt caching at workspace level for the system prompt + ICP context. Operator iteration on the same product family hits cache.

Source-trace integration

When TOOL-001 is called by AGT-901 or AGT-902, the calling agent's BrainAnalysisLog row captures the tool invocation: tool_call_id, input doc reference, output candidate count, cost. The candidate plays drafted into SalesPlayLibrary inherit the brain's proposal_id for cohort retrospective lineage. Operator-direct calls (no brain) write a workspace audit record but no SalesPlayLibrary draft until the operator explicitly accepts.

TOOL-001 API-doc → Sales-play Translator · Tier 3 Specialist Tool · Schema v29
Stateless. No table ownership. Logged through caller's BrainAnalysisLog (when brain-invoked) or workspace audit (when operator-invoked).
Default model: Sonnet. Per-call budget: 50K input + 5K output. Monthly cap: $300/mo.
Closes: Domain 3 gap (API/dev-led GTM) from the v26 architecture evaluation.
Companion: Tier 3 Tools Index · AGT-901 Pipeline Brain · TOOL-003 Sales Play Composer (often chained after TOOL-001)