TOOL-001 — API-doc → Sales-play Translator
Tier 3 Specialist Tool · Stateless · Reads API documentation, produces sales play candidates for technical buyers · Closes Domain 3 gap from v26 architecture eval
Tier 3 · Tool
Specced · v29
Domain 3 · API/Dev GTM
Sonnet
Purpose
Reads a product's API documentation and produces 1–3 candidate sales play definitions for technical buyer personas. The output lands in SalesPlayLibrary with state draft; humans pick up from there to co-define and approve. The tool's job is to translate API surface area into buyer-relevant language — not to invent positioning, just to surface the use cases the API enables and frame them in a way a sales team can actually run with.
Closes the Domain 3 gap (20% coverage in v26 eval). Today no service or agent can read API docs and produce plays for technical buyers; this is an LLM-shaped task that no spec-driven function can encode.
Input schema
{
"api_doc_input": {
"type": "openapi_spec" | "markdown_url" | "raw_markdown",
"content": "...", // OpenAPI JSON, URL, or raw markdown
"doc_version": "string", // e.g., "v2.4.0"
"doc_publication_date": "ISO 8601"
},
"context": {
"product_family": "string", // e.g., "background-checks", "kyc", "verifications"
"current_icp_summary": "string", // 2-3 sentence summary of current ICP from AGT-201
"current_active_plays": [ // current play context to avoid duplication
{ "play_id": "uuid", "name": "string", "segment": "string" }
],
"target_buyer_persona_hint": "string" // optional — "developer", "platform_team", "compliance_engineer", etc.
},
"constraints": {
"max_plays_to_propose": 3, // hard cap; tool never returns more
"include_segment": "string" // optional — restrict proposals to one segment
}
}
Input is validated by the calling agent before invocation. Malformed input results in tool rejection with a structured error response — the tool never silently coerces.
Output schema
{
"tool_call_id": "uuid",
"candidate_plays": [
{
"name": "string", // human-readable play name
"hypothesis": "string", // 2-3 sentence thesis
"target_buyer_persona": "string", // dev / platform / compliance / etc.
"api_capabilities_referenced": [ // which endpoints/capabilities the play depends on
{ "capability": "string", "doc_section": "string" }
],
"target_definition": {
"icp_signals": ["string"], // observable signals: tech stack, job postings, etc.
"lifecycle_stage_fit": "string" // pre-trial, post-trial-stalled, mid-implementation, etc.
},
"suggested_cadence_outline": { // not finished cadence — outline only
"channel_mix": ["email", "linkedin", "developer_event"],
"touch_count_estimate": 6,
"notable_assets_needed": ["string"]
},
"success_criteria_outline": {
"primary_metric": "string", // e.g., "API key activation within 30 days"
"qualifying_signal": "string" // what would tell us the play is working
},
"confidence_self_rating": "high" | "medium" | "exploratory",
"ungrounded_assumptions": ["string"] // explicit list of assumptions NOT backed by API docs
}
],
"input_doc_summary": "string", // 2-sentence summary of what the docs cover
"capabilities_not_translated": ["string"], // capabilities the tool noticed but didn't propose plays for, with reasons
"tool_metadata": {
"model": "claude-sonnet-4-6",
"input_tokens": 0,
"output_tokens": 0,
"cost_usd_estimate": 0.0,
"latency_ms": 0
}
}
Hard rule: ungrounded_assumptions must be populated for every candidate play. The tool cannot claim a play is grounded in API docs when its real basis is general market knowledge. This separation is what makes the output usable downstream — humans can trust the API-grounded parts and treat the assumption parts as starting hypotheses.
Called by
| Caller | Invocation context |
| AGT-901 Pipeline Brain | "What plays does the new product API enable?" — usually after a product launch or major API surface change. Brain calls TOOL-001, then optionally chains to TOOL-003 (Sales Play Composer) to refine the most promising candidate into a fully structured SalesPlayLibrary draft. |
| AGT-902 Account Brain | "For Account X (uses our API heavily, just upgraded their tech stack), what API-anchored plays could land?" — account-specific variant. Less common than AGT-901 invocation but supported. |
| RevOps direct (workspace UI) | RevOps drops a new product API doc into the workspace, calls TOOL-001 with constraints, reviews the candidates. Direct invocation is supported and logged the same way as agent-mediated calls. |
Prompt design principles
- Ground in the docs, separate the assumptions. The tool's prompt explicitly instructs separation between API-grounded claims and market/positioning assumptions. The output's
ungrounded_assumptions field is non-negotiable.
- Buyer language, not feature language. Output describes what the buyer can do, not what the API endpoint does. "Verify identities for high-volume gig platform onboarding" not "POST /verifications endpoint accepts batch input."
- Refuse if docs are too thin. If the input API documentation doesn't contain enough material to ground at least one play, the tool returns 0 candidates with a structured "insufficient_input_signal" reason rather than fabricating plays.
- Don't propose what already exists. The
current_active_plays input is checked against; the tool flags overlapping proposals and de-duplicates against existing active set.
Cost ceiling
| Constraint | Value |
| Per-call input budget | 50K tokens (API doc may be substantial; OpenAPI specs can be 30K+ tokens) |
| Per-call output budget | 5K tokens (candidate plays + metadata) |
| Default model | Sonnet — synthesis-heavy task; Haiku tested but quality below acceptable |
| Per-call cost estimate | ~$0.20–$0.30 per call at Sonnet pricing |
| Monthly cap (default) | $300/mo — bounds usage to ~1,000 calls/month |
| Frequency expectation | Low — product launches and major API changes are infrequent. Most months will see < 50 calls. |
Eval criteria
| Criterion | Measurement | Pass threshold |
| Schema compliance | Output validates against output schema | 100% (hard) |
| API grounding | For each candidate play, % of api_capabilities_referenced items that map to a real endpoint/capability in the input docs (manual reviewer check) | ≥ 95% |
| Assumption disclosure | % of plays where ungrounded_assumptions is non-empty when the play extends beyond API docs (manual reviewer check) | 100% (hard) when extension exists |
| Hallucinated capability rate | % of plays referencing API capabilities the docs don't actually have | 0% (hard) — any hallucinated capability is automatic eval fail |
| Promotion rate (operational) | % of TOOL-001-generated drafts that survive co-definition to active | ≥ 25% |
| P95 latency | End-to-end tool call | ≤ 8s |
Eval suite: 8 retrospective scenarios — 4 historical product launches where we know retrospectively which plays worked, 2 API documentation samples with known capability gaps, 2 edge cases (very thin docs, very deep docs). Scored alongside the brain harness on the same cadence.
Failure modes
| Symptom | Cause | Action |
| Tool fabricates a capability the API doesn't have | Model hallucination on thin docs, or general market knowledge bleed | Hard fail in eval. Tighten prompt to explicitly require capability citations against doc sections. If chronic, hold on Sonnet; Opus tested as fallback. |
| Tool returns 0 plays consistently | Refusing too aggressively, or input docs systematically too thin | Audit input docs vs. refusal reasons. If refusal is correct, the gap is in product docs, not the tool. Otherwise tune refusal threshold. |
| Output plays look plausible but never get promoted | Plays grounded in capabilities but disconnected from real buyer pain | Tune context.current_icp_summary input to give the tool stronger ICP grounding. Refresh prompt with examples of plays that did get promoted. |
| P95 latency creep | Input docs growing in size; tool processing larger contexts | Implement input chunking strategy — summarize large OpenAPI specs to relevant subsections before invocation. Budget cap blocks runaway calls. |
| Cost spiking | Operator running it repeatedly during exploratory sessions, no caching | Enable prompt caching at workspace level for the system prompt + ICP context. Operator iteration on the same product family hits cache. |
Source-trace integration
When TOOL-001 is called by AGT-901 or AGT-902, the calling agent's BrainAnalysisLog row captures the tool invocation: tool_call_id, input doc reference, output candidate count, cost. The candidate plays drafted into SalesPlayLibrary inherit the brain's proposal_id for cohort retrospective lineage. Operator-direct calls (no brain) write a workspace audit record but no SalesPlayLibrary draft until the operator explicitly accepts.