Tier 3 Specialist Tools — Index

Stateless, narrow, callable LLM functions · Called by L9 Brain Agents (or by L1–L8 services for one-shot augmentation) · Not "agents" in the cadence/state sense — functions an agent can call · Index of the Tier 3 category
Tier 3 · Tools v37 · 4 prototyped, 10 specced 15 tools across 4 waves RevOps · technical owners per tool
What Tier 3 is. Tier 3 is a category, not a layer. Tools live alongside the layered services and agents but follow different rules: stateless, no schema ownership, no cadence, no invocation paths beyond "another component called me." A Brain Agent (Tier 2) or occasionally a service (Tier 1) calls a Tool, the Tool runs once, returns a structured result, and is done. Logged for audit but not authoritative. Tools fill the cognition gaps that don't justify a full agent — the surgical instruments of the OS.
Why Tier 3 is a separate category

In the v26 architecture eval the recommendation was three tiers: Tier 1 deterministic services, Tier 2 LLM-native brain agents, Tier 3 specialist tools. The first two get layers (L1–L8 and L9 respectively). Tier 3 doesn't — tools are not part of the layered DAG because:

Tools are functions an agent can call. Treating them as layered components creates the wrong governance overhead. Treating them as opaque LLM calls creates the wrong audit posture. The Tier 3 framing splits the difference: callable contracts with eval discipline, but no layer-level integration burden.
The Tier 3 tool contract

Every Tier 3 tool spec must define:

ElementWhat it captures
PurposeWhat the tool does in one sentence. If the tool needs more than one sentence to describe, it should probably be split.
Input schemaStrict JSON-shaped input. Validated by the calling agent before invocation; tool rejects malformed input.
Output schemaStrict JSON-shaped output. Calling agent depends on the contract; schema changes are breaking changes that require coordinated deployment with callers.
Model tierHaiku (narrow scope, fast) / Sonnet (synthesis-heavy) / Opus (rare, only when measurably necessary). Default is Haiku unless the tool's nature demands otherwise.
Called byExplicit list of which Tier 1 services and Tier 2 brains may call the tool. Out-of-list callers should be a code-review concern.
Cost ceilingPer-call token budget + monthly invocation budget. Hard limits.
Eval criteriaTool-specific eval; runs alongside the brain harness or independently. Tools have lower bar than brains because their scope is narrower.
Failure modeWhat happens when the tool returns a bad result. Calling agent's responsibility to handle, but the tool spec must declare its known failure modes.
Index of specced tools (v38 — 15 tools)

First wave (v29, 4 tools): closed the biggest cognition gaps — Domain 3 (API/dev-led GTM) twice, Domain 1 ("create plays, not just execute") once, UBB consumption forecasting once. Second wave (v31, 4 tools): closed Domain 4 (outbound deliverability), Domain 2 (in-call guidance + competitive narrative), and the post-sales feature-level adoption gap. Third wave (v32, 4 tools): TTV/timing analysis, champion movement detection, pricing sensitivity, onboarding health prediction. Fourth wave (v37, 2 tools): cohort retention forecaster + segment-LTV decomposer, specced alongside AGT-903 Strategy Brain to support multi-quarter portfolio reasoning. Fourteen tools total — further additions remain case-by-case based on observed gaps.

TOOL-001 Domain 3 · API/Dev GTM Sonnet
API-doc → Sales-play Translator
Reads a product's API documentation and produces 1–3 candidate sales play definitions for technical buyer personas. Output goes to SalesPlayLibrary as draft; humans co-define from there.
Called by: AGT-901 Pipeline Brain, AGT-902 Account Brain · Use case: new product launch, API surface change, identifying buyer-relevant capabilities · Closes: Domain 3 gap from v26 eval (the largest single gap in the system).
View spec →
TOOL-002 Domain 3 · API/Dev GTM Haiku
Dev-Persona ICP Enricher
Takes an account record + observed technical signals (job postings, GitHub activity, technographic data) and produces a developer-persona enrichment to augment AGT-201's ICP scoring. Captures dev-buyer signal that AGT-201's 6-dimension model under-weights.
Called by: AGT-201 (as enrichment), AGT-901 Pipeline Brain · Use case: developer-led GTM motion, API-product accounts where the buying signal is technical not commercial · Closes: Domain 3 gap (the dev-persona side).
View spec →
TOOL-003 Domain 1 · Pipeline Generation Haiku prototyped
Sales Play Composer
Given a hypothesis (e.g., "MM healthcare under-covered, vertical-specific play needed") + relevant Tier 1 brain-ready views, composes a structured play definition (target, hypothesis, suggested cadence, success criteria) that lands in SalesPlayLibrary as draft.
Called by: AGT-901 Pipeline Brain primarily, AGT-902 for account-specific plays · Use case: the "create plays, not just execute" gap from Domain 1 of the v26 eval · Closes: the lever that lets a brain proposal become a candidate play.
View spec →
TOOL-004 UBB · Post-sales Haiku prototyped
Consumption Forecasting / Runway Predictor
Reads UsageMeteringLog trailing 90–180 day pattern for one account+SKU and produces a forecast: predicted overage date (or "no overage in next 60 days"), confidence interval, trend characterization (linear / exponential / seasonal / cliff).
Called by: AGT-902 Account Brain (for "is this real expansion" questions), AGT-503 Expansion Trigger (as augment), AGT-402 Forecast Adjuster (for expansion ACV probability weighting) · Use case: distinguishing trend from spike, predicting when overage will hit, sizing the next expansion conversation timing · Closes: UBB-specific gap from v26 eval.
View spec →
TOOL-005 Domain 4 · AI-native outbound Haiku
Outbound Deliverability Monitor
Reads outbound sequence performance + reputation signals (bounce rates, spam complaints, ESP reputation scores, blacklist status) and produces a deliverability health characterization with specific risk flags + recommended actions (throttle, pause, warm-up, manual review).
Called by: AGT-302 (daily batch, drives auto-throttle decisions), AGT-303 (weekly batch, feeds risk-classified recommendations), AGT-901 (for "is outbound quality the bottleneck?" diagnostic), RevOps direct · Closes: Domain 4 gap from v26 eval.
View spec →
TOOL-006 Domain 2 · Win-rate Sonnet
Real-time Call Guidance
Live-call assist. Streaming transcript chunks + full account context produce real-time guidance during the call: objection handling, MEDDPICC gap callouts, next-step suggestions, competitive intel mentions. Sidecar UI; rep decides whether to use any specific suggestion. Quality >> volume — tool returns silence when guidance would be mediocre.
Called by: Live-call sidecar UI (recording-platform integration: Gong / Zoom / Chorus) · Closes: Domain 2 (in-call) gap from v26 eval. Complements AGT-407 retrospective conversation intelligence. Higher infra complexity than other tools — requires live audio integration and sidecar UI surface.
View spec →
TOOL-007 Domain 2 · Win-rate augment Sonnet
Competitive Narrative Writer
Composes deal-specific competitive narratives from CompetitiveKnowledgeBase + deal context. Output is talking points / brief sections / email copy / battlecard excerpts tailored to the specific deal stage, ICP fit, and prior conversation patterns. Hard rule: every talking point cites a real KB section — tool composes; AGT-403 maintains the KB.
Called by: AGT-403 (augments reactive briefs), AGT-405 (deal-active meeting briefs), AGT-902 Account Brain, RevOps direct (battlecard refresh) · Augments: Domain 2 (deal-specific competitive framing). Quality lift, not gap-closer.
View spec →
TOOL-008 Post-sales · Activation Haiku prototyped
Product Adoption Pattern Recognizer
Reads per-account feature engagement telemetry and classifies the adoption pattern: deeply_integrated / surface_only / siloed_by_team / declining / activating. Different from TOOL-004 (consumption volume) — this is which features get used and how broadly. Cohort-relative when baseline available; degrades gracefully when not.
Called by: AGT-501 Customer Health (daily batch — augments feature engagement dimension), AGT-902 Account Brain ("is this account actually getting value?"), AGT-503 Expansion Trigger (qualification augment), AGT-601 Onboarding Orchestrator (activation tracking) · Closes: post-sales feature-level adoption gap.
View spec →
TOOL-009 Post-sales · Activation Haiku
Activation / Time-to-Value Analyzer
Measures longitudinal time-to-value milestones for an account against expected timing benchmarks. Distinct from TOOL-008 (current snapshot) — this is timing analysis: did the account hit first-value within X days, integration within Y, multi-team adoption within Z? Output is per-milestone delta vs. benchmark + projected achievement of remaining milestones based on trajectory.
Called by: AGT-601 Onboarding Orchestrator (weekly batch on onboarding cohort), AGT-501 Customer Health (months 3–9 ramping accounts), AGT-902 Account Brain ("is this account activating on time?"), AGT-704 (cohort-level TTV trends for QBR) · Closes: TTV/timing side of post-sales activation gap.
View spec →
TOOL-010 Domain 2 · Post-sales Haiku prototyped
Champion Movement Detector
Detects champion role changes, departures, or disengagement using multi-signal fusion: LinkedIn job-change webhooks, email-bounce patterns, AGT-407 ConvIntelligence "person stopped attending calls" detection. Outputs movement classification with confidence + recommended interventions. External enrichment dependency — degrades gracefully without LinkedIn signals.
Called by: AGT-502 Churn Risk (weekly batch on T-180 renewal cohort), AGT-401 Deal Health (when champion is economic buyer/strong influencer), AGT-902 Account Brain, AGT-501 Customer Health · Closes: early-warning gap. Champion-loss predicts retrospective churn with substantial lead time when external signals are available.
View spec →
TOOL-011 Domain 1 / Domain 2 Sonnet
Pricing Sensitivity Analyzer
For an active deal at quote stage, classifies pricing sensitivity (low / moderate / high / highly_constrained_budget) from cohort QuoteLog patterns + ConvIntelligence price-objection signals + procurement engagement. Output is sensitivity classification + cohort comparison + do_not_concede flags. Hard rule: never recommends specific discount magnitude — that remains a human decision per AGT-406 spec.
Called by: AGT-406 CPQ & Deal Desk (every quote requiring approval), AGT-401 Deal Health (proposal/negotiation stage), AGT-902 Account Brain, RevOps direct · Augments: AGT-406 deal desk decision support — informational input, not approval mechanism.
View spec →
TOOL-012 Post-sales · Onboarding Haiku
Onboarding Health Predictor
From the first 30–60 days of telemetry on a newly-onboarded account, predicts 6-month outcome: sustained_adoption / surface_only / churn. Trajectory classification + early warning indicators + intervention recommendations sized to onboarding stage. Distinct from TOOL-008 (descriptive) and TOOL-009 (timing) — TOOL-012 is predictive.
Called by: AGT-601 Onboarding Orchestrator (weekly batch, primary), AGT-501 (early forward-looking signal), AGT-902 Account Brain, AGT-704 (cohort-level onboarding trajectory for retention health section) · Closes: predictive gap in post-sales activation toolkit. Onboarding intervention window is the highest-leverage moment in the customer lifecycle.
View spec →
TOOL-013 Strategy · Multi-quarter Sonnet
Cohort Retention Forecaster
Reads observed retention curves for one or more signup-quarter cohorts (from AGT-501 cohort_brain_view) and produces forward projections with confidence bands plus a cross-cohort characterization (stable / degrading / improving / mixed / insufficient_signal). Cohort-level analogue of TOOL-004. Numerical model in code (survival/decay fits + bootstrap intervals + rule-based pattern assignment); LLM portion produces structured interpretation with mandatory credible-alternative reading. Refusal is first-class — cohorts < 25 accounts or < 4 observed periods refuse rather than fabricate.
Called by: AGT-903 Strategy Brain (primary), AGT-704 (only via AGT-903 narrative jobs), RevOps direct · Use case: "did the bet work?", "is retention flattening on newer cohorts?", "what's the projected NRR floor on the 2024 cohort?" · Closes: cohort-projection gap that AGT-903 strategic reasoning needs (per v37 changelog ripple).
View spec →
TOOL-014 Strategy · Multi-quarter Sonnet
Segment-LTV Decomposer
Decomposes observed LTV gaps between segment / vertical / ICP-tier buckets into driver components — initial ACV, tenure, expansion realization, CAC, segment mix — with confidence-banded contribution per driver. Counterfactual decomposition + bootstrap intervals run in code; LLM characterizes operational meaning. Hard rule: a driver is load_bearing only when its bootstrap band excludes zero with same-sign p10/p90 — the LLM cannot promote a driver on narrative grounds. Mandatory credible-alternative reading mirrors TOOL-013.
Called by: AGT-903 Strategy Brain (primary), AGT-704 (only via AGT-903 narrative jobs), RevOps direct · Use case: "which ICP dimensions correlate with realized LTV?", "where's the highest LTV-per-rep payoff?", "is opportunistic-vertical LTV materially different from core LTV?" · Closes: LTV-attribution gap AGT-903 needs for capacity-reallocation, ICP-revision, vertical-entry use cases (per v37 changelog ripple).
View spec →
TOOL-015 Consumption · GP attribution Haiku (Sonnet on deep mode)
Consumption-Margin Decomposer
Decomposes per-customer realized GP into pricing / utilization / backend-cost / tier-mix axes. Projects GP uplift for caller-supplied tier-migration scenarios with switching-cost class + a required credible-alternative articulation. Generic across consumption-pricing businesses (per-token / per-message / per-minute / per-transaction / per-GB) — tokens are config, not contract. Same shape as TOOL-004: deterministic core (decomposition arithmetic, region arbitrage, scenario projection) + Haiku LLM characterization (workload-shaping play classes, credible-alternative). Refusal-first when backend cost coverage drops below a threshold — backend cost attribution is load-bearing; estimation propagates straight into bad expansion decisions.
Called by: AGT-503 Expansion Trigger (primary, daily on tier-migration candidates), AGT-901 Pipeline Brain (cohort GP diagnostics), AGT-902 Account Brain (per-account margin synthesis), AGT-903 Strategy Brain (cohort fan-out on strategic margin retrospectives), RevOps direct · Use case: "which customers qualify for tier migration with material GP uplift?", "where is segment-X margin compressing?", "what's the GP picture on Acme HR + the highest-leverage margin lever?" · Closes: per-customer GP-attribution gap that consumption-pricing tier-migration motions need (per v38 changelog).
View spec →
Future candidates (not specced; case-by-case as gaps emerge)

With three waves shipped (12 tools), the Tier 3 catalogue is broadly complete relative to the v26 architecture eval gaps. Further additions are case-by-case — the bar is observed operational gap, not theoretical coverage. Tracking ideas as candidates for when signal warrants:

CandidateDomainTrigger to add
Procurement Negotiation Pattern RecognizerDomain 1 / Domain 2If post-launch eval of TOOL-011 shows procurement-stage analysis is a distinct cognition need.
Multi-thread Quality ScorerDomain 2Score deal-level multi-threading quality (stakeholder coverage, persona breadth) beyond what AGT-401 deal health scoring captures. Add if AGT-901 surfaces thin multi-threading as a recurring win-rate driver.
QBR Action-Item Outcome TrackerPost-salesTrack QBR commitments through to outcomes, surfacing accountability patterns. Add after AGT-603/AGT-704 adoption shows signal.
Renewal Negotiation Risk ProfilerPost-salesRenewal-specific equivalent of TOOL-011 Pricing Sensitivity. Sharper renewal-stage focus. Add if renewal motion produces distinct sensitivity patterns from new-logo motion.
Industry-Specific Play RefinerDomain 1Vertical-specific narrative refinement for plays exiting SalesPlayLibrary. Add when active plays grow past 30 across verticals and quality lift is observed as a need.
Tier 3 is intentionally lightweight to add — same template, ~1500 words per spec. The discipline is restraint: add tools when gaps emerge from operational signal, not when they emerge from theory. Twelve specced tools are already a lot to maintain alongside the eval harness; further additions trade off against catalog maintenance load.
Invocation patterns
PatternCallerToolExample
Brain calls tool during queryAGT-901 / AGT-902AnyAGT-902 reading per-account view, calls TOOL-004 to forecast overage timing for the account
Service calls tool as enrichmentAGT-201, AGT-503, AGT-402TOOL-002, TOOL-004AGT-201 calls TOOL-002 on account update events to enrich dev-persona signal before ICP rescore
Operator calls tool directlyRevOps via workspace UITOOL-001, TOOL-003RevOps drops a new product API doc into a workspace input; TOOL-001 generates 3 candidate plays for review
Tool chainsBrain calling multiple tools in sequenceAGT-901 → TOOL-001 → TOOL-003Read API docs (TOOL-001 produces candidate plays) then refine the most promising into a structured play definition (TOOL-003)
Cost aggregation across the Tier 3 layer

Each tool has its own per-call and monthly budget. Aggregate Tier 3 spend is monitored separately from Tier 2 brain spend.

ToolDefault modelPer-call budgetMonthly cap (default)
TOOL-001 API-doc translatorSonnet50K input + 5K output$300/mo (low frequency, high context)
TOOL-002 Dev-persona enricherHaiku10K input + 1K output$200/mo (high frequency, narrow scope)
TOOL-003 Sales play composerSonnet30K input + 4K output$300/mo (moderate frequency)
TOOL-004 Consumption forecastingHaiku15K input + 2K output$400/mo
TOOL-005 Outbound deliverabilityHaiku8K input + 2K output$150/mo
TOOL-006 Real-time call guidanceSonnet30K input + 1.5K output (per invocation)$2,000/mo — ~600 customer calls; per-call ~$3
TOOL-007 Competitive narrative writerSonnet15K input + 2K output$200/mo
TOOL-008 Adoption pattern recognizerHaiku10K input + 1.5K output$300/mo (high frequency — daily batch from AGT-501)
TOOL-009 Activation/TTV analyzerHaiku8K input + 1.5K output$200/mo
TOOL-010 Champion movement detectorHaiku10K input + 2K output$250/mo
TOOL-011 Pricing sensitivity analyzerSonnet20K input + 2.5K output$300/mo
TOOL-012 Onboarding health predictorHaiku10K input + 2K output$200/mo
TOOL-013 Cohort retention forecasterSonnet40K input + 4K output$200/mo (low frequency, called by AGT-903 only)
TOOL-014 Segment-LTV decomposerSonnet30K input + 4K output$150/mo (low frequency, called by AGT-903 only)
Total Tier 3 default budget: ~$5,150/mo across all 14 tools. TOOL-006 still dominates (~$2,000/mo for live-call guidance). Excluding TOOL-006, the rest of Tier 3 is ~$3,150/mo across 13 narrowly-scoped tools. The fourth-wave additions (TOOL-013/014) are low-frequency Sonnet tools sized to AGT-903's rare-but-heavy invocation cadence. Combined with Tier 2 brain budgets (~$1,500/mo for AGT-901/902 + ~$300/mo for AGT-903), reasoning + augmentation layer total ~$6,950/mo — still well under half of one RevOps analyst FTE. Track in a single dashboard; alert at 75% of any tier's budget. RevOps configures opt-in per call-type for TOOL-006 to bound cost.
Eval discipline for tools

Tools have eval criteria parallel to brains, but lighter:

Tool eval is its own harness, smaller than the brain harness (typically 10–15 questions per tool). Tool eval results land in BrainEvalLog with a flag distinguishing tool runs from brain runs.