GTM OS — Services changelog
All service delta changes tracked by schema version. Each entry records: what changed, why, what triggered it, and downstream impact.
Naming: the 40 components in L1–L8 are GTM Services. Agents is reserved for the forthcoming Tier 2 reasoning layer. Historical entries below predate this convention; new entries from v25→v26 forward use the updated terminology.
How to read this: Entries are grouped by schema version. Change types: New service — net new service added. New artifact — production schema or supporting artifact added. Update — behavioral or logic change to existing service. Status promotion — service moved from Specced to Built. Schema — table or field addition/extension. Prompt — system prompt delta. Ripple — change triggered by another service or layer being added. Click any entry to expand the full detail.
v38 → v38.5 Control-axis segmentation as parallel dimension to firmographic tier · Two-axis tier-migration · TAM/SAM control-tier breakdown CRO walkthrough surfaced a structural insight: developer-led / consumption-pricing platforms aren't segmented firmographically — they're segmented by level of control / technical sophistication. Customers cluster into three sub-businesses sharing infrastructure (fully_managed competing against hyperscaler bundle, off_the_shelf for differentiated workflows, low_level_control for frontier AI-native research). Each cluster has different competitor sets, different price points, different sales-motion shapes. v38 modeled tier-migration on the margin axis only (shared/dedicated/BYOC); v38.5 recognizes a parallel and independent control axis (managed/off-shelf/low-level) that drives a distinct expansion motion. Three spec updates close the gap: AGT-201 gtm_motion_class classification (parallel to T1/T2/T3 fit-tier), AGT-503 7th signal + tier_migration_axis field, AGT-205 MarketAssumptions 2D matrix (product_family × control_tier).
AGT-201L2 gtm_motion_class — parallel classification (fully_managed / off_the_shelf / low_level_control)
Update
Adds parallel classification output (not a 7th score dimension) to AGT-201. Independent of T1/T2/T3 fit-tier. Drives sales-motion routing — e.g., T1 fit + low_level_control routes to AE pool with technical-depth-first cadence, paired with AGT-602 from week 1; T1 fit + fully_managed routes to demo-led pool with shorter cycle. Rule-based classification reads AGT-208 production_signal + EnrichmentLog tech-stack + ConvIntelligence call-pattern; not LLM-derived. Accounts schema extension: gtm_motion_class ENUM + gtm_motion_class_last_evaluated TIMESTAMPTZ. AGT-202 v38.5 reads this field for motion-aware routing alongside the v38 motion=plg-warm flag.
Why classification, not 7th dimension
The control axis isn't fit prediction — it's motion classification. The 6-dimension ICP scoring (firmographic / vertical / revenue / tech / growth / intent) predicts T1/T2/T3 fit tier; gtm_motion_class predicts which sales motion to use. Different concept; deserves its own output.
Reclassification
Customer can move along the control axis as their team grows / shrinks ML capacity. Reclassification triggered by AGT-208 production_signal threshold breach, ConvIntelligence pattern shift, or RevOps manual override. Triggers AGT-503 v38.5 candidate signal for control-axis tier-migration play.
Optionality
Opt-in per company config. Traditional B2B SaaS without consumption-pricing tiered product mix uses the existing 6-dimension scoring without classification. Turning on requires AGT-208 deployed + AGT-106 per-class AE-pool definitions.
Impact
Sales motion now matches the actual buying motion at the customer. A T1 fully_managed account doesn't get a low_level_control AE who'd scare them off with technical depth; a T1 low_level_control account doesn't get a generic AE who can't hold a research conversation.
AGT-503L5 Tier-migration two-axis extension — margin axis (existing v38) + control axis (new v38.5)
Update
v38 modeled tier-migration on a single axis (low_margin → mid_margin → high_margin = shared/dedicated/BYOC). v38.5 adds parallel control axis (fully_managed → off_the_shelf → low_level_control). 7th signal: control-axis migration eligibility (+20 to +45 pts based on AGT-201 reclassification event scaled by account revenue). New ExpansionLog field tier_migration_axis ENUM(margin / control / combined). Play structure differs by axis: margin-axis hypothesis pre-fills from TOOL-015 GP projection; control-axis hypothesis pre-fills from reclassification signal + EnrichmentLog ML-team evidence + ConvIntelligence research-pattern. Switching cost is often LOWER on control-axis because no customer-side infra change required.
Two motions, two comp incentives
Margin-axis play retires GP-overlay quota on existing-revenue GP uplift (v38 model). Control-axis play retires GP-overlay quota on uplift + new revenue from expanded scope (low_level_control plays often add scope beyond initial product).
Four motion intersections
margin-axis-only (infrastructure-led play, classic v38) / control-axis-only (research-led play, new v38.5) / combined (rare, multi-quarter, high-value, requires SLM review at play creation) / no-migration (no signal).
Why the v38 design wasn't sufficient
A customer on shared infrastructure (low_margin) AND fully_managed control can grow in two distinct directions: infra maturation (margin axis) or ML team maturation (control axis). The sales motion shape, AE pool, comp incentive, technical-prerequisite gate — all differ. Modeling as a single axis would mis-route the play.
Impact
First-class motion for control-axis tier migration. Reps now incentivized via AGT-101 v38 GP-overlay to drive both margin-axis and control-axis migrations with appropriate cadence per axis.
AGT-205L2 MarketAssumptions 2D matrix — product_family × control_tier sizing
Update
v38.5 extends MarketAssumptions with a parallel control_tier dimension. TAM/SAM/SOM are now modeled as a 2D matrix: product-family × control-tier (fully_managed / off_the_shelf / low_level_control / blended). Each cell carries its own analyst-input assumptions (TAM account count, ARPU, GP%), competitive landscape, and migration-flow rate to the next tier. AGT-205 produces a strategy_brain_view (per AGT-903 v37 read contract) keyed by (product_family × control_tier × segment × vertical) — load-bearing input for AGT-903 vertical-entry / capacity-reallocation / ICP-retrospective use cases.
Why three sub-businesses
fully_managed competes against hyperscaler bundle (largest TAM, lowest margin, growth bet). off_the_shelf competes against vertical AI infrastructure (mid TAM, mid margin, path to high via control-axis migration). low_level_control competes against self-hosted vLLM + frontier-model-provider-direct (smallest TAM, highest margin per customer, deepest moat). Sizing them as one number obscures where to invest sales capacity.
Migration-flow assumptions
fully_managed → off_the_shelf: 5–15% per year. off_the_shelf → low_level_control: 3–8% per year (lower because requires customer-side ML team maturation). Reverse migrations tracked as net-of-forward-flow; flagged for AGT-903 strategic-bet retrospective.
Schema extension
MarketAssumptions gains control_tier ENUM + per-cell tam_account_count_estimate, sam_account_count_observed, som_account_count_target, arpu_estimate_usd, gross_margin_estimate_pct, competitive_set JSONB, last_analyst_refresh_date.
Impact
AGT-903 capacity-reallocation queries can now return per-control-tier rep allocation recommendations. Without this matrix, AGT-903 would be reasoning over a blended TAM that obscures the actual strategic levers.
AGT-202 + AGT-106L2 v38.5 ripples — motion-class-aware routing + per-class AE territory pools
Ripple
AGT-202 v38.5 reads gtm_motion_class from Accounts when routing. Routes to AE pools designated for that class — e.g., fully_managed accounts route to demo-led AE pool; low_level_control accounts route to technical-depth AE pool with paired AGT-602. AGT-106 v38.5 extends TerritoryDefinitions with per-class AE pool designations (alongside existing motion_track field from v38). Reps may serve multiple control_tiers but coverage is explicit per-territory.
Cumulative routing dimensions
v38 added motion (traditional / plg-warm / vertical-led). v38.5 adds gtm_motion_class (fully_managed / off_the_shelf / low_level_control). Combined routing key per AGT-202: (segment × vertical × motion × gtm_motion_class). Most cells are sparse — most accounts won't trigger every dimension.
Impact
Routing complexity grows; pays back via better motion-match between rep skills and customer buying behavior.
v37 → v38 Consumption-pricing tiered-platform profile · TOOL-015 Consumption-Margin Decomposer · AGT-208 Developer Signal Scorer · AGT-503 tier-migration extension · AGT-101 + AGT-103 GP-overlay comp variant CRO walkthrough surfaced a target-profile gap: the OS is well-shaped for traditional B2B SaaS but doesn't address the unique dynamics of consumption-pricing developer-led platforms (per-token / per-message / per-API-call businesses with multi-tier product mix). v38 closes that gap with one new Tier 3 tool, one new L2 service, and three updates to existing services. The shape: token-economics IS the business model, the margin-expansion strategy runs through tier migration not seat upsell, the funnel intake is PLG-developer-led not MQL-driven, and rep comp must align to GP not just ARR. v38 is spec-only; build deferred pending synth corpus extension to consumption-event + developer-signal data.
TOOL-015Tier 3 Consumption-Margin Decomposer — per-customer GP attribution + tier-migration projection
New tool
15th Tier 3 tool. Decomposes realized GP per customer-quarter into pricing / utilization / backend-cost / tier-mix axes. Projects GP uplift for caller-supplied tier-migration scenarios with switching-cost class + a required credible-alternative articulation (brain-discipline mirror at tool level). Generic across consumption businesses — tokens, messages, minutes, transactions, GB. Refusal-first when backend cost data is incomplete. Same shape as TOOL-004 (deterministic core + Haiku LLM characterization). Called by AGT-503 (daily on tier-migration candidates), AGT-901 (cohort GP diagnostics), AGT-902 (per-account margin synthesis), AGT-903 (cohort fan-out on strategic margin retrospectives).
Why this tool
Consumption-pricing businesses have GP that varies per customer / SKU / backend deployment in ways subscription SaaS does not. Tier-migration motions need a deterministic decomposition of where each customer's GP actually comes from — otherwise migration plays are bet-by-vibes. AGT-503 v38 explicitly depends on TOOL-015 for tier-migration play scoring.
Decomposition axes
(1) Pricing: list-vs-realized + commit-tier discount. (2) Utilization: effective unit cost vs benchmark, classified tight/moderate/loose. (3) Backend: spend by region + provider, region-arbitrage GP projection. (4) Tier: spend + GP by product tier, current-mix label.
Hard rule — no inventing scenarios
Tool projects tier-migration scenarios that the caller supplies. Tool never invents "what if we migrated 30% of Serverless to Dedicated" on its own. This bounds the reasoning surface and prevents drift.
credible_alternative
Every tier-migration scenario carries a required articulation of the case for NOT migrating. Mirrors AGT-903's option-set discipline at the tool level. Forces honest framing for downstream callers.
Cost
Haiku-default, ~$0.005/call. Sonnet only on AGT-903 deep-mode cohort fan-outs (~$0.05–0.10/call).
Status
Specced v38, build deferred. Build prerequisites: synth corpus extension to consumption-event data with backend cost feeds; AGT-503 v38 ExpansionLog schema landing; benchmark_unit_cost_usd reference data.
Impact
Closes the per-customer GP-attribution gap that consumption-pricing tier-migration motions need. Foundational for AGT-503 v38 tier-migration plays + AGT-101/103 v38 GP-overlay comp.
AGT-208L2 Developer Signal Scorer — pre-MQL PLG-funnel scoring with self-serve to sales-handoff routing
New service
8th L2 service. Scores individual developers on enterprise-progression signal using 5 dimensions (consumption velocity, production signal, enterprise context, commercial intent, stakeholder breadth — 100 pts base). Domain-aggregates multiple devs from same org. Outputs handoff-priority / handoff-warm / monitor / stay-self-serve routing tier — feeds AGT-202 Lead Router with new motion=plg-warm flag. Companion to AGT-201 (firmographic ICP); the two services intentionally use different data and disagreement is itself signal (contradicts_agt201 field surfaces it for RevOps review).
Funnel-shape gap closed
Developer-led / API-platform businesses have a funnel where customers enter as individual devs who sign up + start consuming, often hitting material adoption before any sales conversation. AGT-201 + AGT-202 are post-MQL designed; they don't see this pre-sales adoption signal. AGT-208 is the upstream complement.
Five signal dimensions
Consumption velocity (30 pts), production signal (25), enterprise context (20), commercial intent (15), stakeholder breadth (10). Threshold-based tier mapping: 80–100 priority / 60–79 warm / 40–59 monitor / <40 stay-self-serve.
Domain aggregation
Per-corporate-domain rollup nightly. Sum-capped at 100. Multi-developer override: 3+ devs at same domain each ≥ 60 forces handoff-priority regardless of summed score (organizational-adoption-breadth signal).
AGT-202 v38 ripple
Lead Router gains motion dimension. PLG-warm motion routes to AEs with PLG-track territory designation; cadence is technical-depth-first, longer dwell, no inbound-form etiquette.
contradicts_agt201 flag
When AGT-201 has rated an account T3 (poor fit) but AGT-208 score ≥ 60, surfaces for RevOps review. Doesn't auto-promote — disagreement is signal.
Status
Specced v38, build deferred. Build prerequisites: synth corpus extension to product-event stream + developer roster; AGT-204 enrichment-domain feeds; AGT-202 motion-flag wiring.
Impact
First L2 service designed for PLG-led developer funnels. Reduces time-to-handoff vs cold-outbound by ~50% in spec target; surfaces enterprise-progression signal before customers manually request a call (typically too late).
AGT-503L5 Tier-migration as new expansion play type — GP-uplift play distinct from ARR-uplift play
Update
For consumption-pricing tiered platforms with structural per-tier margin variance (e.g., shared / dedicated / bring-your-own-cloud, or self-service / managed / control-plane). New 6th signal: tier-migration eligibility (+25 to +50 pts based on TOOL-015 GP-uplift projection). New play type: tier_migration, distinct from new_play — the motion targets GP uplift, not ARR uplift, and ARR may stay flat or dip slightly during a successful migration. 4 new gates: switching-cost class threshold, technical-prerequisite check (AGT-602), onboarding-completion gate, revenue-floor gate.
Why in AGT-503
Tier migration is a post-sale motion against an existing customer with an existing relationship; AM is the executor; success is measured in GP delta on the same revenue. That's structurally an expansion play. The deviation from classic expansion is that ARR may not move — which is why the new signal scores GP uplift and AGT-101 v38 introduces a parallel GP-overlay comp variant.
ExpansionLog schema extension
play_type ENUM gains tier_migration. New fields: tier_migration_from, tier_migration_to, projected_gp_uplift_pct, switching_cost_class, realized_gp_uplift_pct (post-migration retrospective).
credible_alternative pre-fill
TOOL-015 returns the case for NOT migrating; AGT-503 pre-fills it on the play card. AM has the contrary view in hand before the customer conversation.
Calibration loop
2 quarters post-migration: AGT-103 writes realized_gp_uplift_pct back to ExpansionLog. If realized falls below 60% of projected, AGT-703 picks up as calibration signal → TOOL-015 system prompt reviewed.
Impact
First-class motion for the consumption-pricing margin-expansion strategy. Reps can now be incentivized (via AGT-101 v38 GP-overlay) to drive tier migrations even when the ARR delta is flat — aligning behavior to the company's actual margin-expansion thesis.
AGT-101 + AGT-103L1 GP-overlay comp variant — dual-measure (ARR + GP) quotas and payouts
Update
Companion comp-plan update to AGT-503 v38 tier-migration. AGT-101: per-rep dual-measure quota (arr_quota + gp_quota); GP quota computed as fraction of ARR using segment-level GP target; new 10th guardrail (GP quota internally consistent with FP&APlan margin range); QuotaStore schema extension. AGT-103: dual-measure attainment (arr_attainment_pct + gp_attainment_pct); combined payout via configurable split (e.g., 60/40 ARR/GP for new-business AEs, 40/60 for existing-business AMs); anti-gaming guard (high combined but low ARR triggers AGT-104 review). Opt-in per rep; CRO + Finance approval gate.
Highest-leverage v38 change
Comp incentivizes behavior. Classic ARR-only comp pushes reps toward fast-close logo volume, which is the wrong incentive when the company's margin-expansion strategy runs through tier migration on existing customers. GP-overlay realigns rep behavior to the actual strategy.
GP attribution mechanism
AGT-103 queries TOOL-015 Consumption-Margin Decomposer for the customer's realized GP over the period. Per-opp GP attribution = (opp_arr / customer_total_arr) × customer_realized_gp. For tier-migration plays specifically: GP attribution = realized_gp_uplift_pct × baseline revenue, attributed to the closing rep.
Anti-gaming guard
A rep on GP-overlay could suppress ARR growth in service of GP improvement (refusing to upsell low-margin volume). Guard fires when combined attainment > 100% but ARR attainment < 70% — flags to AGT-104 for manual review.
Calibration loop closure
AGT-103 writes realized GP uplift back to ExpansionLog post-migration. AGT-703 picks up systematic mis-projections as calibration signal. Closes the brain-discipline loop AGT-903 calls for at the operational comp-plan level.
Optionality
Opt-in per rep. Defaults disabled. Turning on requires CRO + Finance approval (same gate as comp plan changes). Reps without overlay use classic ARR-only attainment unchanged.
Impact
The single highest-leverage spec change in v38. Comp architecture is the lever that turns strategy into rep behavior. Without GP-overlay, tier-migration plays are aspirational; with it, they're aligned to how reps get paid.
AGT-202L2 Lead Router motion-dimension extension — PLG-warm routing track
Ripple
v38 ripple from AGT-208. Lead Router gains motion dimension on routing decisions. Three motion classes: traditional (existing path, MQL-driven, segment-band routing), plg-warm (from AGT-208 handoff-priority and handoff-warm — routes to AEs with PLG-track territory designation), and vertical-led (placeholder for future industry-specific motion). PLG-track AEs operate on different cadence: technical-depth-first qualification, longer dwell, no inbound-form etiquette, AE leads with the developer's existing usage context (assembled by AGT-208 brief).
Why a motion dimension
PLG vs traditional sales motion is structural — different sales process, different cadence, different qualification criteria. Adding motion to AGT-202 routing prevents PLG-warm leads from being routed to traditional inbound AEs who'd treat them as cold MQLs.
TerritoryDefinitions extension
v38: AGT-106 TerritoryDefinitions gains motion_track field. PLG-track territories carve from the same account universe but with explicit motion designation. AEs may serve one or both motions; assignment is per-territory not per-AE.
Impact
Routing-layer support for parallel sales motions. Required by AGT-208; will be used by future vertical-motion services.
CRO Demo Script + Tools_IndexDocs v38 doc updates — target-profile section + tools count update
Ripple
CRO_DEMO_SCRIPT.md gains a new "target-profile" section explicitly framed for consumption-pricing developer-led platforms (without naming any specific company). Tools_Index.html updated: 14 → 15 tools, TOOL-015 added to wave 4. Architecture-tab principles revised to reflect the consumption-platform pattern. Brain-Ready Views Contract referenced for forthcoming TOOL-015 → AGT-503 / AGT-901 / AGT-902 / AGT-903 wiring.
No customer naming in repo
Per the standing rule, all v38 doc references describe the target as "consumption-pricing developer-led platform" or "tiered-product-mix multi-cloud SaaS" — never a specific company name. The OS spec is a generic capability surface, not a custom build.
Demo script flow
8-tab walk extends to mention v38 capabilities — tier-migration in Planning panel scenarios, GP-overlay comp talk-track, AGT-208 PLG funnel framing.
v36 → v37 AGT-903 Strategy Brain specced · StrategyRecommendationLog schema · strategy brain-ready view contract obligation on 10 Tier 1 services Third L9 brain agent specced to fill the multi-quarter portfolio-reasoning gap that AGT-901 (current pipeline) and AGT-902 (per-account) don't cover. AGT-903 reasons across 4–12 quarters of cohort and trajectory data to answer ICP-revision, vertical-entry, capacity-reallocation, pricing-strategy, and strategic-retrospective questions. Output is option-shaped (2–4 alternatives + tradeoffs + risk surface + assumptions-must-hold), never single answers. New write target StrategyRecommendationLog with its own state machine (draft → under_review → endorsed/shelved/retired). Endorsement triggers a human-led planning workstream owned by the relevant Tier 1 service — never modifies a canonical table directly. New contract obligation: strategy_brain_view extensions on 10 Tier 1 services. Spec-first; build follows once views ship and Tier 3 cohort tools exist.
AGT-903L9 Strategy Brain — multi-quarter portfolio reasoning · executive-invoked · Opus default
New agent
Third Tier 2 brain. Differentiated from AGT-901/902 by horizon (4–12 trailing quarters + 2–6 forward), scope (portfolio/segment/vertical/motion), stakeholder (CRO/CFO/CEO), output shape (option-set memos), and default model (Opus). Distinct action taxonomy: propose_icp_revision, propose_segment_redefinition, propose_vertical_entry, propose_capacity_reallocation, propose_pricing_packaging_review, flag_strategic_risk, recommend_market_research_query, recommend_human_query, none. Every action maps to a downstream human-led workstream — never a direct canonical edit. Never on cadence; concentrated around annual planning, board prep, mid-year inflection.
Why a third brain (vs extending AGT-901)
AGT-901's read contract is current-period brain-ready views; extending it with multi-quarter cohort projections would have bloated its system prompt and conflated current-pipeline-operational reasoning with strategic-portfolio reasoning — different stakeholders, different cost profile, different evaluation criteria. Cleaner separation: each brain has a distinct horizon, distinct action taxonomy, distinct write target.
Read contract
Strategy brain-ready view extensions on AGT-201 (icp_outcome_brain_view), AGT-205 (strategy_brain_view — TAM/SAM/SOM), AGT-501 (cohort_brain_view), AGT-503 (strategy_brain_view), AGT-702 (strategy_brain_view — multi-quarter Magic Number/NRR/GRR/R40/CAC Payback), AGT-703 (strategy_brain_view — multi-quarter win-loss + forecast bias evolution), AGT-105, AGT-101, AGT-404, AGT-604. New contract obligation on each owning Tier 1 service. Where a view does not yet exist, AGT-903 declines and surfaces the gap rather than estimating from raw rows.
Write contract
BrainAnalysisLog (own log, shared schema with AGT-901/902) + StrategyRecommendationLog (drafts only). Explicitly does NOT write SalesPlayLibrary — plays are AGT-901/902's job, downstream of an endorsed strategic recommendation. May write annual-planning narrative sections to BusinessReviewLog per AGT-704 charter extension.
Output discipline
Options-shaped: 2–4 viable options per memo with distinct hypotheses (not minor variants), tradeoffs matrix per option, four-class risk surface (market/execution/capacity/model_assumption), falsifiable assumptions-must-hold list. If brain finds one obvious answer, it is required to articulate at least one credible alternative + the conditions under which it would be correct. Eval criterion enforces option count ≥ 2.
Cost profile
Rare-but-heavy: 10–30 queries/month at 150K input + 10K output, Opus default. Per-query cap 200K input + 20K output. Monthly budget alert at $300 default (CRO-configurable). Refusal on missing/stale strategy view is the cheapest correct answer (~one tool-call's tokens).
Promotion gate
draft → under_review (CRO pickup) → endorsed (CRO + CFO joint approval; CEO required if scope_severity = material). Endorsement triggers a planning workstream owned by the relevant Tier 1 service — never modifies a canonical table. Healthy cadence: 2–5 endorsed recommendations per year. Strategic churn is a known anti-pattern; brain availability should not lower the cost of strategic second-guessing.
Status
Specced only. Build deferred until (a) critical mass of strategy_brain_view extensions ships and (b) at least one Tier 3 cohort tool (likely TOOL-013 cohort retention forecaster, TOOL-014 segment-LTV decomposer) exists. Spec-first to anchor the contract obligations.
Impact
L9 brain layer feature-complete at the spec level — three brains cover current-pipeline, per-account, and multi-quarter portfolio horizons. The CRO-conversation range expands from operational coverage to strategic bets.
StrategyRecommendationLogSchema Production schema — AGT-903 strategic memos with options, tradeoffs, risk surface, assumptions-must-hold
New artifact
Companion workspace to SalesPlayLibrary, but for strategic recommendations rather than plays. State machine: draft → under_review → endorsed / shelved / retired. No volume cap (executive bandwidth is the natural limiter). CHECK constraints enforce architectural commitments: option count between 2 and 4 for propose_* actions, risk_surface must include all four risk classes, endorsed_option_label must match a real option, writer_agent_id must be AGT-903. Endorsement requires CRO + CFO (+ CEO if scope_severity = material) joint approval. Endorsement does NOT modify any canonical table — the strongest separation between Tier 2 reasoning and Tier 1 governance in the OS.
Why a separate table from SalesPlayLibrary
Different consumer (no execution engine reads endorsed rows directly), different state machine (shelved as a non-terminal "set aside"), different approval roles (CRO/CFO/CEO vs SLM/RevOps), different volume discipline (no cap; cadence-as-limiter), longer retention (warm 4y / cold 10y vs SalesPlayLibrary's 18mo / 7y) because strategic decisions have longer relevance horizons.
Key fields
options_enumerated (JSONB array, 2–4 items, each with hypothesis + projected_impact_range + required_investment + capacity_implications + tier1_dependencies), tradeoffs_matrix, risk_surface (four-class structured), assumptions_must_hold (with brittleness flag), suggested_workstream_owners, endorsed_option_label, scope_severity (routine/significant/material), data_staleness_acknowledged.
Calibration signals
Endorsement rate per quarter (target band 10–30% — too low = options not credible; too high = brain reading leadership preference). Edits during review (heavy = drafts are starting points; light = brain is calibrated). Brittle-assumption surfacing rate (for failed endorsements: did brain pre-flag the failing assumption?). Retrospective outcomes vs projected impact range, 4–6 quarters out.
No execution wiring
SalesPlayLibrary's execution wiring (AGT-302 reads active plays) is irrelevant here. Endorsed rows trigger workstreams in AGT-201/AGT-105/AGT-101/AGT-205/AGT-203/AGT-302/AGT-403/AGT-406/AGT-802 depending on action_type. Each owning service runs its normal redesign cycle — strategic recommendation is an input, not a substitute for governance.
Impact
Schema-level reinforcement of the architectural rule that brains never directly modify canonical configuration.
Strategy brain-ready view contractContract New view-extension obligation on 10 Tier 1 services to support AGT-903
Ripple
AGT-903 cannot ship usefully without strategy_brain_view extensions on AGT-201, AGT-205, AGT-501, AGT-503, AGT-702, AGT-703, AGT-105, AGT-101, AGT-404, AGT-604. These differ from AGT-901's brain-ready views by (a) longer time windows (4–12 quarters), (b) cohort axis where applicable (signup-quarter cohorts on AGT-501 + AGT-503), (c) outcome-correlation projections (e.g., ICP score × realized LTV on AGT-201's icp_outcome_brain_view). Refresh cadence is at most quarterly. View definition lives with the owning Tier 1 service; refusal-on-missing is the brain-side discipline.
Why the brain doesn't compute its own multi-quarter aggregates
Same architectural rule as AGT-901 / AGT-902: brains never recompute Tier 1 numbers. Cohort retention curves, multi-quarter NRR/GRR trajectories, win-rate evolution by segment — all are deterministic computations that belong on the owning Tier 1 service. Brain estimation would break source-trace integrity.
View ownership
Each Tier 1 service owns its strategy_brain_view extension. AGT-201 owns icp_outcome_brain_view (correlation of 6-dimension ICP score to realized LTV/NRR/churn/win-rate by cohort). AGT-501 + AGT-503 own cohort_brain_view (signup-quarter cohort retention curves). AGT-702 + AGT-703 own multi-quarter strategy_brain_view (extended time window of existing brain-ready views). AGT-205 owns strategy_brain_view of MarketAssumptions with current penetration. AGT-105/AGT-101/AGT-404/AGT-604 own their respective extensions.
Tier 3 tool consequences
Two Tier 3 tools likely needed to make AGT-903 useful: TOOL-013 cohort retention forecaster (analogous to TOOL-004 consumption forecaster — projects cohort retention curves forward with confidence bands) and TOOL-014 segment-LTV decomposer (decomposes LTV by segment/vertical/ICP-tier to support capacity-reallocation and ICP-revision use cases). Neither specced yet.
Status
Contract obligation specced; views not yet built. AGT-903 build-readiness depends on critical mass of these views shipping. Each service can ship its strategy_brain_view independently — incremental rollout supported (AGT-903 declines on any missing view rather than estimating).
Impact
Backlog work for 10 Tier 1 services; sequenced after AGT-903 use-case prioritization.
v35 → v36 Follow-a-Lead doubles to two arcs (Growth + Risk) · TOOL-010 Champion Movement Detector prototyped · spec-citation discipline doc CRO walkthrough surfaced two improvements. (1) The Follow-a-Lead walkthrough needed a defensive counterpart to the growth-arc story — added a champion-loss-decliner narrative (Massive Dynamic Holdings, $221K → $180K save at NRR 81%) with sub-tab toggle inside the same explainer tab. (2) The renewal-risk diagnosis would be richer with a champion-movement classifier — TOOL-010 prototyped, wired into AGT-902, used by the decliner-arc brain moments. (3) A CRO-caught SLA drift between narrative and AGT-202 spec produced a standing discipline doc to prevent recurrence.
Follow a Lead — second arcUX Risk arc: Massive Dynamic Holdings champion-loss save (NRR 81%)
New narrative
Same starting point as the growth arc ($221K MM/T1) but champion departs at Day +210; brain catches the inflection at T-90 renewal, recommends SLM intervention, account renews at $180K (contained loss, not churn). Three brain reasoning moments embedded as real EVAL-Q01 / EVAL-Q06 / EVAL-Q10 brain outputs — captured verbatim from the eval harness.
Why a second arc
CRO walkthrough of v35 surfaced the demand: "show me the bad case." The growth-arc-only demo answered "does the system work for expansion" but not "does it catch risk humans missed." Both arcs together give the demo CRO conversational range without doubling cost or complexity.
Sub-tab toggle
Inside the Follow-a-Lead tab, a two-button selector lets the viewer switch between arcs without leaving the page. Both narratives load on tab open; switching is instant. Stark Logistics (growth) is the default; clicking the Risk button swaps to Massive Dynamic.
Narrative scaffold
13 timeline steps: lead → discovery → close → go-live → healthy quarter → champion departure (Day +210) → decline → brain diagnosis → SLM intervention → account-team rotation → brain handoff briefing → renewal negotiation → renewal closed. Three amber brain moments at Day +275 (renewal-risk diagnosis), Day +296 (handoff briefing for new AE), Day +280 (stale-data variant query).
File layout
eval/samples/follow_a_lead_growth.json + eval/samples/follow_a_lead_decliner.json (the original was renamed for parallelism). Explainer JS loads both at tab-open, sub-tab toggles between them.
Impact
CRO demo range doubled at minimal complexity cost.
TOOL-010Tier 3 Champion Movement Detector — status promotion: specced → prototyped
Status promotion
Runtime at prototype/tools/tool_010.py. Multi-signal fusion to classify champion movement: left_company / role_changed_internal / stopped_engaging / engagement_declining / no_movement_detected. Hard rule preserved: any classification stronger than no_movement_detected requires at least one moderate-or-stronger contributing signal. Wired into AGT-902 via Anthropic tool-use; called on renewal-risk diagnoses for accounts with engagement-shift signals.
Implementation pattern
Same shape as TOOL-008 — deterministic core (signal aggregation per contact_id, strength classification per signal type) + LLM characterization (movement_type label + recommended interventions, hard-rule-enforced). Per-call cost ≈ $0.004 (Haiku, ~1K input + 700 output). Spec model was Haiku; matched.
Graceful degradation
Synth corpus has no LinkedIn or email-bounce signals → external_signal_coverage="none". Tool reports overall_quality="low" and degrades confidence per spec — left_company classification cannot be high-confidence without external signals. Smoke test on champion_loss_decliner: tool correctly classifies "stopped_engaging" with confidence="medium" (not "left_company" / "high"), surfaces 4 explicit ungrounded_assumptions.
Wiring
Registered in prototype/tools/registry.py as tool_010_champion_movement_detector. Per-tool input augmentation in dispatch_tool derives tracked_champions + attendance signals from conversation_intelligence_log at call time (no corpus extension needed; champion roster is inferred from conv intel rather than separately tracked).
Wired into
AGT-902 system prompt updated: TOOL-010 listed alongside TOOL-004 + TOOL-008. Brain calls when diagnosing renewal risk on champion_loss-shaped accounts. Will surface in re-curated brain_outputs.json samples after the next eval sweep.
Impact
Champion-loss diagnosis is now a structured signal, not narrative speculation. The decliner-arc renewal-risk diagnosis becomes more concrete: brain calls TOOL-010, gets back stopped_engaging classification with cited evidence, factors that into the recommended interventions.
Spec-citation discipline docProcess docs/SPEC_CITATION_DISCIPLINE.md — read-spec-before-quoting rule
New artifact
CRO walkthrough caught a drift between narrative ("3-business-day SLA on hot T1 inbound") and the actual AGT-202 spec ("T1 = 2 hours"). Audit found the drift was isolated to one hand-authored narrative file; the spec, the simulator, and all 14 brain-output samples were correct. Standing discipline doc captures the rule: read the spec before writing any narrative that quotes a specific spec value (SLAs, dimension counts, scoring weights, schema fields, action-taxonomy enums).
The rule
Anything that sounds like "X has Y at Z threshold" must be grounded in specs/AGT-NNN_*.html, not working memory. Architectural framing (tier structure, single-writer-per-table) is safe to write from memory; specific spec values are not.
Audit results (2026-05-04)
14 brain output samples — 0 spec drift. All proposed_actions[].lever references match spec ownership (AGT-302, AGT-203, AGT-503, AGT-603, AGT-504, AGT-501, AGT-902). The brain narratives are constrained by sources_read citations, so they don't make spec-attribution claims. The vulnerability is hand-authored narrative content (eval/samples/*.json, README.md, follow_a_lead JSONs).
Files governed
gtm_os_explainer.html (simulator JS), eval/samples/follow_a_lead*.json, README.md, prototype/README.md, schema/GTM_OS_Changelog.html, prototype/PORT_TO_CORPORATE.md. Exempt: brain runtimes (their system prompts ARE the source of truth for action taxonomies), fixtures.py (test cases not narrative), synth/*.py (generates data not quotes spec).
Impact
Process discipline. Prevents recurrence; gives any future contributor (human or AI) the rule + checklist + audit table.
v34 → v35 Explainer pivot: real brain outputs replace simulator-only tabs · Follow-a-Lead end-to-end narrative CRO feedback drove a pivot: deterministic JS simulator tabs that mirrored spec logic now sit alongside (and partially replaced by) two new tabs that show the system actually working. Brain Outputs = a gallery of real prototype runs from the eval harness, citation-resolved and clickable. Follow a Lead = an end-to-end walkthrough of one synthetic account from inbound lead through expansion + renewal, with two real brain reasoning moments embedded inline. The "Lead Ingestion" simulator was renamed "ICP Scorer Demo" to set narrower expectations; "Planning Period" and "Forecast & WBR" simulators were retired.
Brain Outputs galleryUX 14 real brain runs, citation-resolved, clickable to full BrainAnalysisLog
New tab
Gallery of curated samples from the eval harness — 11 AGT-902 (per-account) + 3 AGT-901 (cohort). Each card: question, account, narrative excerpt, cost, token counts, tool-call count, action count. Click to expand: full narrative with [src:N] chips inline, sources_read table, tool_calls_made list with input/output/status, proposed_actions with brain confidence + lever, eval criterion pass/fail rows. Filter by writer (902/901), difficulty (easy/medium/hard/stale), free-text search.
Source data
Curation script prototype/eval/curate_brain_samples.py reads brain_analysis_log.jsonl + brain_eval_log.jsonl, picks the latest passing run per fixture, strips run-state-only fields, attaches fixture metadata, writes static eval/samples/brain_outputs.json. Re-run after fresh evals to refresh.
Why this matters
CRO showed-up reaction to the prior explainer was: 3 deterministic JS simulators are graphical recreations of spec logic, not proof of working reasoning. The gallery is direct counter-evidence — every card is a real prototype output, citation-resolved, with audit trail visible.
Cost
$0 to render (static JSON). Total spend across the curated samples (the actual eval runs) was $1.49.
Impact
First on-explainer demonstration of the system actually working.
Follow a Lead walkthroughUX End-to-end synthetic account: lead → close → onboarding → overage → expansion brain → renewal
New tab
Single synthetic MM/FinTech account (Stark Logistics, expansion_ready archetype) walks through 15 timeline steps from Day -120 (inbound lead) through Day +330 (renewal closed at $355K, NRR=161%). Each step shows: primary service, secondary services, what changed (canonical table delta), evidence rows (the actual log entries that would write). Two steps are highlighted as brain reasoning moments with embedded real BrainAnalysisLog outputs — clicking opens the same modal as the Brain Outputs gallery.
Brain moments
Day +120 (consumption overage qualification — EVAL-Q04 real output) and Day +130 (cross-team siloed adoption diagnosis — EVAL-Q11 real output, with TOOL-008 cited inline). Both pulled from the curated brain_outputs.json and rendered with their full BrainAnalysisLog: narrative, sources, proposed actions, tool calls.
Account choice rationale
expansion_ready archetype was picked over champion_loss_decliner for the CRO-friendly story arc — growth posture (expansion + renewal at 161% NRR) rather than defensive (decline + escalation). Every step references real spec artifacts (CadenceEventLog, ConvIntelligence, MutualActionPlan, OnboardingLog, ExpansionLog, ChurnRiskLog, etc.).
Source data
eval/samples/follow_a_lead.json — narrative scaffold with brain_moment refs to fixture_ids in brain_outputs.json. Edit the JSON to refine the narrative; no code changes needed.
Why this matters
CRO question was: "How could I run a lead end-to-end through this, like does the system actually work?" This tab is the answer. A click-through walks one account through every relevant layer of the DAG, with two amber-highlighted moments where the brain actually reasons.
Impact
Pivots the explainer from "spec architecture" to "spec architecture + working system narrative."
Tab restructureUX Lead Ingestion → ICP Scorer Demo · Planning Period + Forecast & WBR retired
Tab restructure
Three changes to the explainer's nav. (1) "Lead Ingestion" renamed to "ICP Scorer Demo" with a sub-line clarifying it's a deterministic JS reproduction useful for explaining the dimensional model — narrow scope, links to Brain Outputs / Follow a Lead for end-to-end reasoning. (2) "Planning Period" and "Forecast & WBR" simulator tabs retired — orphan deterministic JS scoring functions removed. (3) New tab order: Architecture · The OS · Connection Map · Brain Outputs · Follow a Lead · ICP Scorer Demo.
Why retire 2 simulator tabs
In the spec era they were the strongest demo of "the system has been thought through." Now that AGT-902 + AGT-901 actually run, those JS scoring functions are the WEAKER part because they're hardcoded math, not reasoning. Brain Outputs + Follow a Lead replace them with substance.
Why keep ICP Scorer Demo
The 6-dimension ICP scoring + routing demo is genuinely useful for explaining how AGT-201's dimensional model works. Narrow, deterministic, calibratable — keeps its niche. Renamed + sub-line set the right expectation that it's narrow.
Code removed
scorePlanLocal, runPlanSim, scoreForecastLocal, runForecastSim — ~15K chars of JS scoring logic deleted from the explainer, replaced with the gallery + walkthrough renderers (~13K chars). Net code reduction.
Impact
CRO-readability of the explainer materially upgraded.
v33 → v34 SalesPlayLibrary draft loop end-to-end · TOOL-003 prototyped · BrainViewSource seam · calibration probes standing The "brain proposes → human co-defines → service executes" loop is now demonstrable in the prototype. SalesPlayLibrary writer converts brain play-shaped actions into structured drafts; TOOL-003 enriches them with starting cadence + success criteria; a static HTML viewer renders drafts as reviewable cards. View-source abstraction makes synth → corporate a single subclass away. Calibration probes (5/5 passing) document validator teeth.
SalesPlayLibrary writerL9 · Prototype prototype/sales_play_library.py — brain proposals → draft records
New artifact
Closes the human-codefinition loop in the prototype. Both brain runtimes (AGT-901, AGT-902) now produce structured SalesPlayLibrary draft records on every run, keyed to BrainAnalysisLog.proposal_id for cohort-level retrospective lineage.
Action-type filter
Only multi-touch sequence-shaped actions become drafts: open_expansion_play (AGT-902, scope=account_specific) and draft_play (AGT-901, scope=segment). Single-shot interventions (pull_qbr_forward, customer_communication, escalate_to_slm, brief_new_ae_or_csm) and information actions (recommend_human_query, flag_coverage_gap, none) stay in BrainAnalysisLog and route to their target service or human via existing mechanisms — they are not plays.
Brains write only `draft` state
Per the SalesPlayLibrary schema state machine: brains insert draft rows; humans transition draftunder_reviewactive. SLM + RevOps joint approval required for activation. Volume cap enforced at activation, not at draft write.
Lineage preserved
originating_proposal_id, originating_analysis_id, originating_action_type, originating_lever, brain_confidence, supporting_source_indices. Enables cohort-level retrospective: brain-co-designed plays vs. plays designed without brain involvement.
Hypothesis = brain justification verbatim
The most important signal for the human reviewer is the brain's reasoning — captured unmodified. Reviewer edits cadence + criteria + name in under_review; the originating hypothesis stays as written for retrospective traceability.
Per-invocation cap
First 5 play-shaped actions per brain run get TOOL-003 enrichment; remainder fall back to placeholder cadence + criteria. Defensive guard against pathological brain runs producing many drafts and burning Haiku budget unexpectedly.
Impact
Three-tier human-codefinition loop is end-to-end demonstrable. Prototype now shows: brain proposes → writer drafts → tool enriches → viewer renders → (human picks up, mocked).
TOOL-003Tier 3 Sales Play Composer — status promotion: specced → prototyped
Status promotion
Runtime at prototype/tools/tool_003.py. Converts brain play hypothesis into structured cadence (4-12 touches over 14-45 days) + success criteria (target meeting/opp-create rates, evaluation window). Invoked by SalesPlayLibrary writer per play-shaped action, NOT exposed to the brain via TOOL_DEFINITIONS — brains shouldn't draft cadences directly.
Change
Status: speccedprototyped. Implementation matches the spec contract.
Model deviation from spec
Spec called for Sonnet; prototype uses Haiku. Rationale: the task is narrow (cadence composition from a 1-2 sentence hypothesis), output budget bounded (1.5K tokens), and Haiku's per-call latency keeps the SalesPlayLibrary writer fast enough that brains can produce drafts synchronously without timing out. Per-call cost ≈ $0.004 (vs. ~$0.05 at Sonnet). Promote to Sonnet if production output quality slips below acceptable.
Hard rule preserved
Tool NEVER invents account-specific facts. If hypothesis mentions "champion engagement", cadence may say "champion-engaged touchpoint" but cannot fabricate names. When numbers are guesses, confidence drops to "low" and ungrounded_assumptions is annotated. Smoke test: 5 explicit caveats listed for a single-call output.
Wiring
Registered in prototype/tools/registry.py as tool_003_sales_play_composer. Handler in TOOL_HANDLERS but NOT in TOOL_DEFINITIONS — invoked by writer, not brain. Anthropic TOOL_003_DEFINITION included for future-use consistency.
Validated by
EVAL-Q11 (AGT-902 expansion play) + EVAL-P01 (AGT-901 cohort plays) — drafts now have populated cadence + criteria instead of pending_human_codefinition placeholders. Total eval cost unchanged at <$0.20 per fixture.
Impact
SalesPlayLibrary drafts are now review-ready out of the box. Reviewers see: brain hypothesis (verbatim) + suggested 6-touch cadence + target meeting rate ≈ 0.7 + explicit ungrounded_assumptions list — instead of empty stubs.
SalesPlayLibrary viewerL9 · Prototype prototype/sales_play_library_viewer.html — local-only draft browser
New artifact
Static HTML page that reads the sales_play_library.jsonl log and renders each draft as a reviewable card: brain hypothesis, scope, writer, confidence, cadence, success criteria, supporting source lineage. Filter by state / writer / scope; free-text search the hypothesis.
Why local-only
JSONL is gitignored (run state). The viewer renders empty state when accessed via GitHub Pages — that's intentional. Local users serve the prototype dir with python3 -m http.server after running an eval.
Action buttons mocked
"Pick up for review" / "Reject" buttons are visual mocks — state transitions need server-side enforcement that's out of prototype scope. The buttons demonstrate the workflow without implementing it.
Visual language
Slate palette matching the explainer (v33). L9 amber for writer badges (AGT-901 / AGT-902) consistent with brain layer theme. State pills: amber=draft, blue=under_review, green=active, gray=retired.
Impact
First visible demo of the human-codefinition workflow. Brain runs produce JSONL → viewer renders cards → demonstrates the promotion gate without needing a full workspace UI.
BrainViewSourceL9 · Prototype prototype/view_source.py — synth ↔ warehouse seam
New artifact
Abstract BrainViewSource interface between Tier 1 data and Tier 2 brain code. Brains, tools, and view extractors no longer read corpus files directly — they go through a source. Swapping synth corpus → real warehouse is a single subclass + one factory branch.
Interface
load_account_corpus(account_id) · account_exists(account_id) · account_data_freshness(account_id) · iterate_account_ids() · metadata(). Five methods are everything brains and tools depend on.
Implementations shipped
SynthCorpusSource (production-grade for the prototype, file-mtime staleness, ground_truth.json-driven account iteration) and WarehouseViewSource (stub class — every method raises NotImplementedError, documents what a real impl returns).
Threading
Source threaded through run_for_account, run_for_pipeline, call_brain, dispatch_tool, extract_brain_ready_view, extract_pipeline_view. Legacy file-IO branches preserved for backwards compat but unused under default execution. Eval verified: 14/14 fixtures pass through the seam.
Migration template
prototype/PORT_TO_CORPORATE.md updated with concrete WarehouseViewSource subclass example. Brain-ready view extractors do NOT need to change when porting — only compression budgets per component.
Impact
Synth → corporate is now an abstraction, not a refactor.
Calibration probesEval discipline prototype/eval/calibration_probes.py — standing validator-teeth check
Eval discipline
Cost-free synthetic-output probes that verify each eval-harness rule actually catches violations. Re-runnable forever; serves as both regression guard and executable documentation of what each rule's violation looks like. 5/5 passing as of v34.
Probes shipped
C1 Invented action_type (AGT-902 enum) — validator must emit hard taxonomy issue. C1b AGT-902 action_type leaked into AGT-901 enum — verifies taxonomy= parameter is enforced. C2 Unresolved [src:99] citation — validator must emit hard citations issue. C2b Negative control — all citations resolve cleanly, validator silent (no false positives). Sanity default taxonomy is AGT-902 (legacy callers don't accidentally validate against AGT-901's enum).
Hardening tightened on
_check_must_cite_tool in scorer now dual-checks dispatch-side tool_calls_made AND narrative-side sources_read. Catches fabricated citations (calibration probe finding from previous wave). Detail messages explain which side failed.
Pattern for adding probes
Add a new probe whenever a new validator rule is added to validate_all. (1) Construct synthetic output violating the rule. (2) Call the validator function directly. (3) Assert the rule's hard issue is in the returned issues list. (4) Add a negative-control probe for false-positive coverage.
Impact
Eval harness teeth are now provable, not assumed.
Explainer + brand chromeUX v33 brand palette swap (gold → slate) + scrub agent-vs-service drift + real top-level README
Polish
Brand chrome (header wordmark, active nav, italic emphasis) swapped from gold to slate-900. L9 / Tier-3 amber identity preserved as intentional brain/tool visual signal. User-visible "40 agents across 8 layers" copy fixed to "40 services across 8 layers" — agent vocabulary now reserved exclusively for L9 brain agents. Top-level README rewritten as the real public-facing front door (was 3 lines).
Editorial change
Active nav state distinguished by bottom-border underline rather than full colored border + bg tint. Italic "complete" in headline rendered as bold-italic with slate underline (3px, 8px offset). Net effect: layer rainbow palette and L9 amber are the only color in the document; brand chrome goes monochrome.
Vocabulary scrub
Last L1-L8 spec name carrying legacy "Agent" suffix renamed (AGT-101 "Quota Setting Agent" → "Quota Setting"). All other L1-L8 service names already clean. L9 "Pipeline Brain" / "Account Brain" / "Reasoning (Tier 2 Agents)" preserved — those ARE agents.
Impact
Documentation precision + visual coherence.
v32 → v33 Brain Prototype Wave: AGT-901 + AGT-902 + TOOL-004 + TOOL-008 prototyped against synthetic corpus First end-to-end prototype of the L9 Reasoning layer running against synthetic GTM data. Both Brain Agents and two Tier 3 tools have working runtimes with passing eval harnesses. Status enum extended with prototyped to capture this state. Bridge document added for migration to corporate data.
AGT-902L9 · Tier 2 Account Brain — status promotion: specced → prototyped
Status promotion
Runtime at prototype/agt902.py. 11-fixture eval (EVAL-Q01–Q11) at 11/11 pass. Multi-turn tool-use loop calls TOOL-004 and TOOL-008. Brain-ready view extractor compresses per-account composite to ~5–10K tokens for the Sonnet brain.
Change
Status: speccedprototyped. Runtime exists, evals pass, not yet wired to canonical Tier 1 tables in production.
Eval coverage
11 fixtures across question types: churn_diagnosis (Q01, Q02, Q09), expansion_qualification (Q04, Q05, Q11), handoff_briefing (Q06), qbr_narrative (Q08), diagnosis_on_stale_data (Q07, Q10), stalled_onboarding (Q03). All 11 pass at $1.16/run.
Action taxonomy
pull_qbr_forward / open_expansion_play / brief_new_ae_or_csm / customer_communication / escalate_to_slm / recommend_human_query / none. Enforced by validator at eval time.
max_tokens
4096 — sufficient for per-account narratives. (Cohort brains require 8192; see AGT-901 entry.)
Impact
Three-tier architecture validated end-to-end on synthetic data. Brain-ready view contract works as the seam between Tier 1 (deterministic) and Tier 2 (LLM-native).
AGT-901L9 · Tier 2 Pipeline Brain — status promotion: specced → prototyped
Status promotion
Runtime at prototype/agt901.py. 3-fixture eval (EVAL-P01 SMB diagnosis, P02 expansion ranking, P03 vertical coverage) at 3/3 pass. Cross-account aggregate view extractor + cohort-shaped action taxonomy. Drills into specific accounts via TOOL-004/TOOL-008 — proves cross-tier orchestration works.
Change
Status: speccedprototyped. New aggregate view extractor at prototype/aggregates.py rolls up segment / vertical / ICP-tier dimensions plus top-K drill-down anchors.
Eval coverage
3 fixtures across question types: cohort_diagnosis (P01), expansion_prioritization (P02), coverage_gap (P03). All 3 pass at $0.32/run. New criterion must_cite_source verifies the brain cites the specific rollup table its claims came from.
Action taxonomy
draft_play / flag_coverage_gap / recommend_query_for_human / none. Distinct from AGT-902 — pipeline-shaped (cohort actions), not per-account.
max_tokens
8192 — cohort narratives are longer than per-account. First smoke test failed three retries at 4096 with consistent JSON truncation around char 12K. Captured as a calibration learning: per-account brains run on 4K, cohort/multi-source brains require 8K+.
Tool drill-down
Brain calls TOOL-004 / TOOL-008 with account_id from top_expansion_candidates / top_churn_risks / stalled_onboardings drill-down anchors. Validates cohort hypotheses against per-account evidence — proves the cross-tier pattern.
Impact
Two-brain architecture validated. AGT-901 (cohort) and AGT-902 (per-account) operate as complements, not duplicates.
TOOL-004Tier 3 Consumption Forecasting — status promotion: specced → prototyped
Status promotion
Runtime at prototype/tools/tool_004.py. Deterministic time-series core (linreg, log-linear, cliff/seasonality detection) + Haiku characterization. Wired into AGT-902 via Anthropic tool-use; called on expansion-qualification questions.
Change
Status: speccedprototyped. Implementation matches the spec: deterministic numerical core in code, narrow LLM characterization for pattern label + interpretation.
Wiring
Registered in prototype/tools/registry.py as tool_004_consumption_forecast. Schema in TOOL_DEFINITIONS; handler in TOOL_HANDLERS; corpus augmentation in dispatch_tool reads UsageMeteringLog from the synth corpus.
Validated by
EVAL-Q04 (real expansion confirmed), EVAL-Q05 (spike-then-crash trap), EVAL-Q11 (siloed expansion qualification). Brain consistently cites tool output as canonical source via [src:N].
Impact
UBB-specific cognition gap closed in prototype. Pattern: numerical work in code, characterization in LLM — applies to TOOL-008 and future tools.
TOOL-008Tier 3 Product Adoption Pattern Recognizer — status promotion: specced → prototyped
Status promotion
Runtime at prototype/tools/tool_008.py. Gini concentration index, breadth metrics, abandonment counts + Haiku classification across 5 patterns (deeply_integrated / surface_only / siloed_by_team / declining / activating). Onboarding-aware: <60d into contract → activating regardless of breadth. Wired into both AGT-901 and AGT-902.
Change
Status: speccedprototyped. Required new corpus extension: 23-feature engagement telemetry per account (5 categories: core / advanced / integration / admin / experimental — modeled on a B2B API product with UBB pricing).
Smoke test
8/8 archetypes correctly classified including the spec's hard onboarding-aware rule. Deeply_integrated (ideal_power_user) and siloed_by_team (expansion_ready) discriminated by Gini concentration threshold; activating (stalled_onboarding) preserved for <60d accounts.
Validated by
EVAL-Q11 with new must_cite_tool_008 criterion. Brain consistently uses tool output for adoption-depth diagnosis when consumption-volume signals are ambiguous.
Impact
Post-sales feature-level adoption gap closed in prototype. Three-tool post-sales activation stack (TOOL-008 + TOOL-009 + TOOL-012) now has its first member operational.
Status enumAll layers Status enum extended: built / prototyped / specced
Schema
Third value prototyped added to capture the runtime-exists-evals-pass state between specced (design only) and built (production-deployed). Gold/amber palette in explainer matches L9 theme. Tools_Index uses the same convention with per-tool prototyped badges on TOOL-004 and TOOL-008.
Change
CSS rule .status-prototyped added to explainer (gold/amber: #FAEEDA bg, #7A4A0E fg, #D4A84C border). Tools_Index gets .tool-status-prototyped with same palette.
Why a new state
"Built" implies production deployment with real data, real users, real audit trail. The prototype runs against synthetic data and isn't wired to canonical Tier 1 tables. Calling it "built" would overstate readiness; calling it "specced" understates the work done. The third state captures the truth.
Impact
Documentation precision. Forward changelog entries can promote incrementally: specced → prototyped → built.
Synthetic data layerPrototype synth/ — 8 archetypes, 50-account corpus, full Tier 1 telemetry
New artifact
Generates a 50-account corpus with daily-granularity UsageMeteringLog, CustomerHealthLog, PaymentEventLog, conversation summaries (Haiku-generated, cached), and 23-feature engagement telemetry. 8 archetypes (ideal_power_user, activating, surface_only_adopter, champion_loss_decliner, expansion_ready, spike_then_crash, seasonal, stalled_onboarding) span the patterns the brain agents and tools must recognize.
Files
synth/archetypes.py (8 archetypes with usage/health/conversation/feature profiles), synth/main.py (corpus orchestrator), synth/usage.py / health.py / payments.py / feature_engagement.py (per-component generators), synth/conversations.py (LLM-generated call summaries with cache).
Determinism
Seed-driven for reproducibility. Feature-seed derived via hashlib.sha256(account_id) rather than consuming from the main RNG — invariant: corpus regen does not shift account UUIDs, conversation cache stays stable.
Throwaway in corporate
Per prototype/PORT_TO_CORPORATE.md: synth/ generators are throwaway when porting to a corporate environment. Real Tier 1 services replace them. The boundary is the brain-ready view extractor.
Impact
Substrate for solo prototyping without real corporate data.
Calibration findingsEval discipline Three calibration learnings captured in eval-design memory
Eval discipline
(1) Tool retirement requires coordinated 3-way removal across schema/handler/system-prompt — schema-only disablement is leaky. (2) Cohort/multi-source brains need max_tokens=8192; per-account brains run fine on 4096; symptom of under-budgeting is silent JSON truncation around char 12K. (3) tool_calls_made is dispatch-side ground truth; sources_read is brain-authored narrative — future hardening: cross-check both.
Discovery
Calibration probe on EVAL-Q11: temporarily removed TOOL-008 from TOOL_DEFINITIONS only. The brain still emitted tool_use calls for TOOL-008 because the system prompt described it; dispatch found the handler (still registered) and ran the tool successfully. False-pass on what should have been a failing rule. Real probe required removing handler too.
Output-budget rule
First smoke test of AGT-901 failed three retries in a row with Unterminated string at char 12383 (per-attempt). Bumping max_tokens from 4096 to 8192 fixed it on first try. The failure mode looks like a transient parser flake but it's actually a budget cap. First thing to check when a brain hits sudden JSON-decode errors.
Impact
Discipline for retiring tools and provisioning new brains. Both findings preserved in user's eval-design memory and in prototype/PORT_TO_CORPORATE.md's "risks worth flagging early" section.
Bridge docMigration prototype/PORT_TO_CORPORATE.md — synth → real corporate data checklist
New artifact
Concrete migration plan for moving the prototype to a corporate environment. Phase 0–5 sequence (eval baseline → UsageMeteringLog → ConvIntelligence → CustomerHealthLog/ChurnRisk/Expansion → feature_engagement + TOOL-008 → Brain Agents go live). Risks flagged: real conversation transcript length, feature taxonomy mapping, PII handling, API cost monitoring, tool-retirement discipline.
Why now
Easier to write the bridge doc while the prototype is fresh than to reverse-engineer it from code later. Captures rationale, not just code.
Build sequence
Per the original architecture doc's recommendation: L8 first (UsageMeteringLog + AGT-804 revenue recognition), brain second. Brain reading from stub UsageMeteringLog produces fake-confident output — the worst possible failure mode for an audit-sensitive system.
Impact
Reduces friction when entering corporate environment.
v31 → v32 Tier 3 third wave: TTV analyzer · Champion movement · Pricing sensitivity · Onboarding health predictor 12 tools across 3 waves. Tier 3 catalogue is broadly complete relative to v26 architecture eval gaps. Further additions are case-by-case based on operational signal.
TOOL-009Tier 3 Activation / Time-to-Value Analyzer
New tool
Measures longitudinal TTV milestones for an account against expected timing benchmarks. Distinct from TOOL-008 (current snapshot) — this is timing analysis. Distinct from TOOL-012 (predictive) — this is delta-vs-benchmark for the milestones already-defined.
Change
New tool: TOOL-009. Haiku, 8K input + 1.5K output, $200/mo cap.
Distinct from TOOL-008
TOOL-008 is descriptive (current state of feature adoption). TOOL-009 is longitudinal (timing of milestone achievement vs benchmark). Both can be called for the same account; they answer different questions.
Distinct from TOOL-012
TOOL-009 measures observed delta vs benchmark on configured milestones. TOOL-012 (also v32) projects 6-month outcome from early signals. Three post-sales activation tools work as a stack: TOOL-008 (current state) + TOOL-009 (timing analysis) + TOOL-012 (outcome prediction).
Hard rules
Milestones configured in input (RevOps + Product define), tool measures against them — never invents milestones. Cohort-relative when baseline available, graceful degradation when not. Honest projection confidence (low/medium when activity signal is weak).
Called by
AGT-601 Onboarding Orchestrator (weekly batch on onboarding cohort), AGT-501 Customer Health (months 3–9 ramping accounts), AGT-902 Account Brain ("is this account activating on time?"), AGT-704 (cohort-level TTV trends for QBR retention section).
Impact
Closes the timing-side of post-sales activation gap.
TOOL-010Tier 3 Champion Movement Detector
New tool
Detects champion role changes/departures/disengagement via multi-signal fusion (LinkedIn, email-bounce, AGT-407 attendance). High-leverage early-warning signal often missed by behavioral telemetry. External enrichment dependency for full effectiveness.
Change
New tool: TOOL-010. Haiku, 10K input + 2K output, $250/mo cap.
Multi-signal fusion
Single weak signals do not produce high-confidence movement classifications. Title change without engagement decline may just be a promotion. Email bounce alone could be OOO misconfig. Tool fuses LinkedIn signals, email-bounce signals, AGT-407 ConvIntelligence call attendance, email engagement — produces classification with explicit contributing signals listed.
Movement types
Five classifications: left_company, role_changed_internal, stopped_engaging, engagement_declining, no_movement_detected. Each maps to different intervention recommendations — champion who left vs. shifted internal role vs. disengaged are different problems.
Hard rules
Every classification stronger than no_movement_detected requires at least one contributing signal with strength ≥ moderate. Privacy-disciplined (only signals already in OS or from approved enrichment under signed data agreements). Single-signal restraint: weak single signals never produce high confidence.
Called by
AGT-502 Churn Risk (weekly batch on T-180 renewal cohort), AGT-401 Deal Health (when champion is economic buyer/strong influencer), AGT-902 Account Brain, AGT-501 Customer Health.
Impact
High-leverage early-warning signal. External enrichment required for full effectiveness — without LinkedIn/email-bounce signals, tool degrades to internal-only mode with lower confidence ceilings.
TOOL-011Tier 3 Pricing Sensitivity Analyzer
New tool
Classifies deal-level pricing sensitivity from QuoteLog cohort patterns + ConvIntelligence price-objection signals + procurement engagement. Decision support for AGT-406 deal desk reviewer. Hard rule: never recommends specific discount magnitude — that remains a human decision.
Change
New tool: TOOL-011. Sonnet, 20K input + 2.5K output, $300/mo cap.
Sensitivity taxonomy
Four classifications: low_sensitivity (concession unnecessary — discount erodes margin without improving close), moderate_sensitivity (cohort-typical envelope), high_sensitivity (multiple revisions likely — consider non-price levers first), highly_constrained_budget (qualify out or reduce scope, don't discount to meet budget).
Hard rules
Output never contains specific discount percentages or unit prices — regex-enforced in eval. Tool produces classification + cohort context + observation patterns; human deal desk reviewer determines actual discount magnitude per AGT-406 spec. Cohort baseline degradation honest. Aggregate discount drift monitored quarterly — if pre-tool vs post-tool median discount drifts > 2pp, calibration sprint triggered.
Anti-pattern flags
do_not_concede_flags at least as valuable as the sensitivity classification. Examples: weak_qualifier (champion not confirmed — don't discount until qualified), discount_above_authority, contract_term_concession_higher_leverage (term length / payment terms / multi-year may have higher leverage than equivalent price concession; preserves list-price baseline for renewals).
Called by
AGT-406 CPQ & Deal Desk (every quote requiring approval — primary), AGT-401 Deal Health (proposal/negotiation stage), AGT-902 Account Brain, RevOps direct (pricing strategy work).
Impact
Augments AGT-406 deal desk decision support. Approval flow and discount authority remain unchanged. Calibration risk — aggregate use of the tool must not produce monotonic discount drift over time.
TOOL-012Tier 3 Onboarding Health Predictor
New tool
From the first 30–60 days of telemetry, predicts 6-month outcome (sustained_adoption / surface_only / churn). Trajectory classification + early warning indicators + intervention recommendations sized to onboarding stage. Predictive complement to TOOL-008 (descriptive) and TOOL-009 (timing).
Change
New tool: TOOL-012. Haiku, 10K input + 2K output, $200/mo cap.
Trajectory taxonomy
Five classifications: strong_activation (cohort-leading signals), on_track_activation (cohort-typical), concerning_trajectory (one or two weak signals), early_stall_risk (multiple weak indicators clustered — multi-pronged intervention warranted), active_stall (effectively stopped, escalate to SLM, may require contract-level intervention).
Three-tool post-sales activation stack
TOOL-008 (descriptive: current adoption pattern) + TOOL-009 (longitudinal: timing vs benchmark) + TOOL-012 (predictive: project 6-month outcome). Three different questions for the same account; AGT-501 / AGT-601 / AGT-902 may invoke any combination depending on the question being asked.
Hard rules
Cohort-anchored predictions (without cohort baseline, projection_confidence drops). Intervention-window-aware (same weak signal at day 14 vs. day 60 has different intervention implications). Honest about projection uncertainty — AGT-601 weights interventions by confidence so low-confidence predictions don't trigger heavy interventions.
Called by
AGT-601 Onboarding Orchestrator (weekly batch on first-90-day cohort, primary), AGT-501 Customer Health (early forward-looking signal), AGT-902 Account Brain, AGT-704 (cohort-level onboarding trajectory in QBR retention section).
Impact
Closes predictive gap in post-sales activation toolkit. Onboarding intervention window is the highest-leverage moment in the customer lifecycle — getting prediction right early matters more than getting any other post-sales signal right.
Tier 3 catalogTools Tools_Index updated — 12 tools across 3 waves
Catalog refresh
Tools_Index v32: 12 tools indexed, cost aggregation refreshed (~$4,800/mo total), future-candidates section replaces deferred third-wave-candidates section. Catalog broadly complete relative to v26 eval gaps.
Change
Tools_Index header updated to "12 tools across 3 waves." Cost aggregation table extended with TOOL-009/010/011/012 budgets. Third-wave-candidates section replaced with future-candidates section listing 5 case-by-case ideas (Procurement Negotiation Pattern Recognizer, Multi-thread Quality Scorer, QBR Action-Item Outcome Tracker, Renewal Negotiation Risk Profiler, Industry-Specific Play Refiner) flagged for if/when operational signal warrants.
Aggregate cost
Tier 3 default budget: ~$4,800/mo across 12 tools. Combined with Tier 2 brain budgets ~$1,500/mo, reasoning + augmentation layer total ~$6,300/mo — roughly 1/3 of one RevOps analyst FTE. Bounded; trackable in single dashboard with per-tier 75% alerts.
Catalog discipline going forward
"Add tools when gaps emerge from operational signal, not when they emerge from theory." 12 tools is already substantial maintenance load alongside the eval harness. Further additions trade off against catalog maintenance.
Impact
Tier 3 catalogue broadly complete relative to v26 architecture eval gaps. Restraint over volume going forward — the discipline is now to keep the catalog sharp, not to grow it.
v30 → v31 Tier 3 second wave (TOOL-005/006/007/008) · L7 fully Built (AGT-701/702/703/704) · AGT-107 Built · Connection Map polish All Tier 1 deterministic backbone now Built across L1–L8. Tier 3 catalogue doubled to 8 tools. Domain 4 (deliverability) and Domain 2 (in-call + competitive narrative) gaps closed. Connection Map distinguishes brain reads visually.
TOOL-005Tier 3 Outbound Deliverability Monitor
New tool
Reads outbound performance + reputation signals (bounce rates, spam complaints, ESP scores, blacklist status). Risk flags + recommended actions. Closes Domain 4 gap.
Change
New tool: TOOL-005. Haiku, 8K input + 2K output, $150/mo cap.
Hard rules
Recommendation grounding 100% (every recommended_action traces to a specific risk_flag). False positive rate on green periods ≤ 10% (over-flagging undermines the tool). Statistical significance check before flag generation; minimum send volume threshold prevents variance noise.
Called by
AGT-302 (daily batch, drives auto-throttle decisions with RevOps notification), AGT-303 (weekly batch, feeds risk-classified recommendations), AGT-901 Pipeline Brain (diagnostic queries), RevOps direct (workspace UI investigation).
Impact
Closes Domain 4 gap — AGT-302/303 cover sequence generation/execution/optimization but no component owned deliverability health observability until now.
TOOL-006Tier 3 Real-time Call Guidance
New tool
Live-call assist via recording-platform integration. Streaming transcript chunks + full account context produce real-time guidance: objections, MEDDPICC gaps, next-step suggestions. Higher infra complexity than other tools. Closes Domain 2 (in-call) gap.
Change
New tool: TOOL-006. Sonnet (with Haiku fallback for low-stakes), 30K input + 1.5K output per invocation, ~30 invocations per customer call (60s cadence over 30 min) ≈ $3/call. Monthly cap $2,000/mo (~600 calls).
Hard rules
P95 latency ≤ 3s (hard) — missed latency = unusable tool. Suggestion grounding 100% (every suggestion traces to account_context or recent transcript). Privacy compliance 0 instances of PII surfacing. Quality >> volume — tool returns silence (suppress=TRUE) when no high-quality guidance available.
Called by
Live-call sidecar UI (recording-platform integration with Gong / Zoom / Chorus). Rep can pause/resume during off-the-record moments. No auto-action; rep decides whether to use any specific suggestion.
Complement to AGT-407
AGT-407 retrospective + TOOL-006 live = full conversation intelligence loop. CallGuidanceLog (new schema) FK's to ConvIntelligence; AGT-701 reads both for full coaching picture.
Impact
Closes Domain 2 (in-call) gap. Higher infrastructure complexity than other tools — live audio integration + sidecar UI surface required. Implementation gated on recording-platform partnerships.
TOOL-007Tier 3 Competitive Narrative Writer
New tool
Composes deal-specific competitive narratives from CompetitiveKnowledgeBase + deal context. Hard rule: every talking point cites real KB section. Tool composes; AGT-403 maintains. Domain 2 quality lift.
Change
New tool: TOOL-007. Sonnet, 15K input + 2K output, $200/mo cap.
Hard rules
KB grounding 100% (every talking_point cites a real KB section). Hallucinated competitor claims 0% (cannot extend beyond what's in the KB). Stale KB recognition 100% (entries > 90d old must surface staleness caveat).
Distinct from AGT-403
AGT-403 owns the canonical CompetitiveKnowledgeBase. TOOL-007 composes deal-specific narratives from it — never extends or corrects. KB updates remain AGT-403's responsibility.
Format-aware output
Talking points read differently from email copy from battlecard excerpts. Tool adapts tone + structure to narrative_target. Output includes a do_not_say field flagging anti-patterns specific to the competitor — equally important as the talking points.
Impact
Domain 2 quality lift — the difference between "here's our positioning vs. X" and "for this prospect at this stage with these specific concerns, here's how to frame the X comparison." Quality lift, not gap-closer.
TOOL-008Tier 3 Product Adoption Pattern Recognizer
New tool
Recognizes feature-by-feature adoption patterns (deeply_integrated / surface_only / siloed_by_team / declining / activating). Different from TOOL-004 (volume) — this is which features get used and how broadly. Closes post-sales feature-level adoption gap.
Change
New tool: TOOL-008. Haiku, 10K input + 1.5K output, $300/mo cap (high frequency — daily batch from AGT-501 across active accounts).
5-pattern taxonomy
deeply_integrated (broad breadth + high users-per-feature + integrations active — strong renewal + expansion signal); surface_only (core features only, low per-feature adoption — competitor swap risk); siloed_by_team (decent breadth concentrated in one team — cross-team expansion signal but champion-loss risk); declining (abandoned > new — strong churn signal); activating (newly-onboarded with rapid breadth growth — onboarding success).
Hard rules
Onboarding-aware classification (account < 60 days into contract not labeled "surface_only" — must be "activating" or pre-pattern, 100% eval-enforced). Cohort-baseline graceful degradation (returns medium/low data_completeness; doesn't fabricate cohort comparisons).
Called by
AGT-501 Customer Health (daily batch — augments feature engagement dimension), AGT-902 Account Brain ("is this account getting value?"), AGT-503 Expansion Trigger (qualification augment, deeply_integrated/siloed boosts expansion signal, declining suppresses), AGT-601 Onboarding Orchestrator.
Distinct from TOOL-004
TOOL-004 is consumption volume (will overage hit, when, what trend). TOOL-008 is feature engagement (which features are used and how broadly). Complementary — both used together for full post-sales picture.
Impact
Closes post-sales feature-level adoption gap — AGT-501 had seat utilization but not feature engagement until now. That distinction predicts churn/expansion much earlier than seat metrics alone.
L7 LayerL7 L7 fully Built — AGT-701/702/703/704 promoted
Status promotion ×4
All L7 measurement agents promoted from Specced to Built. Closes-loop learning is operational: AGT-703 calibration signals propagate back into AGT-101/201/402/404/403; AGT-704 assembles WBR/MBR/QBR with staleness gate.
AGT-701
Rep Performance & Coaching, Specced (v21) → Built (v31). Role-parameterized across AE/AM/CSM/SDR. Manager approval gate on coaching delivery preserved. Reads ConvIntelligence with v23 call_owner_role filter for role-specific signals.
AGT-702
GTM Health Monitor, Specced (v21) → Built (v31). Magic Number, NRR, GRR, R40, CAC Payback. Promotion follows L8 buildout because Magic Number's denominator separates billed (AGT-802) from recognized (AGT-804) revenue — AGT-702 cannot be Built until those distinctions are deterministic.
AGT-703
Win-Loss & Forecast Accuracy, Specced (v21) → Built (v31). Produces calibration signals that propagate back into AGT-101 (quota), AGT-201 (ICP), AGT-402 (forecast), AGT-403 (competitive), AGT-404 (top-down). Without AGT-703 Built, OS had no closed-loop learning — this completes the loop.
AGT-704
Business Review Orchestrator, Specced (v21) → Built (v31). Closes L7. Reads from AGT-701/702/703 with hard staleness gates. v26 charter extension intact: brain narrative may write narrative sections of MBR/QBR but never metric sections. AGT-901 invokes for narrative jobs only.
Impact
Closed-loop learning operational. Calibration signals from AGT-703 → upstream services. Business reviews fully assembled with audit-grade source-trace. L9 brains can read AGT-702 MetricsCalc.brain_view + AGT-703 WinLossLog.brain_view + AGT-701 RepCoachingLog with confidence — the brain-ready view contracts on these sources are now backed by Built services.
AGT-107L1 Quota Plan Document — promoted Specced → Built
Status promotion
Last L1 Specced agent promoted. L1 fully Built. Hiring → quota → comp → legally deployable plan document chain is closed loop.
Change
AGT-107 Quota Plan Document, Specced (v23) → Built (v31). E-signature gate, three output formats (HTML/PDF/JSON), amendment recommendation logic. AGT-104 reads QuotaPlanDocLog.signature_received as one of its 12 governance audit controls.
L1 closure
Every L1 agent now Built. AGT-101 quota → AGT-102 comp → AGT-103 attainment → AGT-104 governance → AGT-105 capacity → AGT-106 territory → AGT-107 plan document. Hiring-quota-comp cycle is a complete operational loop.
Impact
Combined with L7 closure: 36 of 40 services Built across L1-L8. Remaining specced services: none. L0 infrastructure remains config-only (no agents).
Map polishUI Connection Map — brain reads visually distinguished
UI polish
Connection Map renderer now colors L9 → Tier 1 read edges in gold (#D4A84C) instead of generic blue "reads" color. Legend updated to 4 edge types. Brain layer's reach across the OS is visible at a glance.
Change
SVG edge color logic in renderMap() updated: edges with source agent ID starting with AGT-9 render in #D4A84C (L9 layer color). All other edges render per existing type-color mapping (feeds=green, reads=blue, cross=orange).
Legend
Connection Map legend now lists 4 edge types: Feeds (Tier 1 → Tier 1), Reads (Tier 1 → Tier 1), Cross-layer (Tier 1 → Tier 1), Brain reads (L9 → Tier 1). Map instructions text updated to mention brain agents.
Impact
Cosmetic. The reach of L9 brains across the Tier 1 layers is visible at a glance — previously brain edges rendered identically to service-to-service reads, obscuring the architectural distinction.
v29 → v30 L8 fully Built (AGT-801/802/803 promoted) · Brain-Ready Views Contract · AGT-302 v27 ripple captured Deterministic backbone is now fully Built across L1–L8. Brain-Ready Views Contract closes the load-bearing gap that made L9 brains theory until this version. AGT-302 v27 ripple formalized in its own spec.
AGT-801L8 Order Management — promoted Specced → Built
Status promotionHardened spec
L8 entry-point promoted. Production controls (8) + edge cases (8) added. Two-track approval, amendment lineage, billing contact validation, SOW reference enforcement, 7-year retention.
Change
AGT-801 promoted from Specced (v25) to Built (v30). Spec hardened with Production Controls section (8 controls) + Edge Cases & Failure Modes section (8 scenarios). Existing schema unchanged; ripple of v27 (originating_proposal_id) noted.
Production controls
Two-track approval enforced (Pricing track + Legal track tracked independently); pricing correction Finance Director floor; amendment lineage preserved (originals never overwritten); auto-renewal clauses surfaced explicitly (non-NULL required); TCV computed deterministically (no manual override); billing contact validated (non-NULL on activation); SOW reference required when services on order; 7-year retention parallels RevenueRecognitionLog.
Edge cases handled
Customer-rescinds-before-signature; mid-term seat expansion mid-billing-cycle; pricing correction post-invoice (forces AGT-804 prior-period adjustment); AGT-802 offline at activation; handwritten markup discrepancy; two-track approval mid-flight imbalance; SKU-add with conflicting payment terms; auto-renewal trigger.
Why now
L8 buildout sequence per v26 plan: AGT-804 first (recognition is audit-critical), then AGT-801 (orders source-of-truth) before AGT-802 (billing depends on orders + recognition guarantees). Cascade complete with this v30 promotion.
Impact
L8 entry point Built. Order data is now contractually-locked source of truth for every downstream finance agent. External integration still required for go-live — spec is production target, not a deployed system.
AGT-802L8 Billing & Invoicing — promoted Specced → Built
Status promotionHardened spec
Promoted. Invoice traceability to OrderLog, MAX(record_version) discipline on UsageMeteringLog reads, customer sign-off gate on milestone billing, daily external reconciliation. 8 production controls + 8 edge cases.
Change
AGT-802 promoted from Specced (v25) to Built (v30). Production Controls + Edge Cases sections added.
Production controls
Invoice amounts traceable to OrderLog (every InvoiceLineItems carries order_line_item_id FK); consumption invoices read MAX(record_version) from UsageMeteringLog (corrections produce credit memos, not silent invoice updates); customer sign-off required for milestone billing (IE-only completion does not unlock); credit memo two-step gate for rep-submitted (manager + Finance Director); billing error → Finance Director floor; idempotent invoice generation (no duplicate billing on retry/replay); payment terms inherited not overridden (changes require AGT-801 amendment); daily external billing system reconciliation at ±$0.01 tolerance.
Edge cases handled
Customer disputes consumption invoice (UsageMeteringLog.audit_status = disputed flow); SOW partial completion (no partial billing); mid-cycle amendment pro-rated supplemental; annual upfront pre-payment (cash and recognition decoupled per ASC 606); external billing system offline (internal records held, retry on recovery); reconciliation discrepancy (AGT-803 holds payment tracking on affected invoices); externally-processed refund without AmendmentLog (forces back-fill); mid-period churn with prepaid balance.
Impact
Customer-facing invoice generation now production-spec'd. External billing system integration required for go-live. Architectural invariants preserved — AGT-802 doesn't decide what was sold or what's recognized; executes against AGT-801 + AGT-804.
AGT-803L8 Payment Health — promoted Specced → Built
Status promotionHardened spec
Promoted. Closes L8 buildout. State transitions immutable, per-account aggregation, Suspended state Finance-only, retry-then-AM-notification discipline. 8 production controls + 9 edge cases.
Change
AGT-803 promoted from Specced (v25) to Built (v30). Closes L8 buildout — every L8 agent now Built.
Production controls
State transitions logged immutably (PaymentEventLog append-only with prior_state/new_state/transition_reason); per-account state aggregation (worst-applicable wins across multiple invoices); Suspended state requires Finance director-level authorization (no automatic transition; CSM/AM cannot trigger); retry exhaustion before AM notification (prevents storms on transient failures); AGT-501 modifier is read-only feed (writes only payment_health_status, not other CustomerHealthLog fields); external failure events idempotent (dedupe key: invoice_id + attempt_number + external_event_id); auto-recovery on payment to Current state; 7-year retention.
Edge cases handled
Multi-invoice account with mixed states (worst-applicable wins); Failed-invoice paid same day new invoice goes Overdue (no re-notify); external billing system payment-confirmation gap (held by reconciliation); customer changes payment method during Overdue; alternate payment methods on file (Retry 2/3 may use); mid-cycle Suspended cascade to AGT-501/502/503/504; reactivation from Suspended; payment reversal/chargeback recorded as distinct event; AGT-803 offline during failure event (recovery preserves accuracy).
Cascade impact
Post-sales signal chain now fully Built end-to-end: UsageMeteringLog (production schema v26) → AGT-501 health (built) → AGT-502 churn (built) → AGT-503 expansion (built), with AGT-803 payment health gating the cap (built). The post-sales backbone is the strongest part of the OS — per the v26 architecture eval (95% domain coverage).
Impact
L8 fully Built. Deterministic backbone complete L1–L8. L9 brains can now read against trustworthy source data — the architectural prerequisite for production launch.
Brain-Ready ViewsTier 1→2 Contract & Catalog (new artifact) — 20 views formally specced
New artifactSchema
The load-bearing piece of the brain layer's read contract. AGT-901/AGT-902 specs referenced 20 brain-ready views; this artifact formally defines all of them: projection, filter, refresh, staleness threshold, token budget, owner.
Change
New schema artifact: Brain_Ready_Views_Contract.html. Defines the contract pattern + catalogues all 20 views (9 cross-population for AGT-901, 11 per-account components for AGT-902 composite).
Why this is load-bearing
Until v30, brain-ready views existed only as references in AGT-901 and AGT-902 specs. AGT-901 references 9 (MetricsCalc, Opportunities, ForecastLog, Accounts, AccountPriorityScore, CapacityPlan, WinLossLog, CadenceEventLog, VoCSynthesisLog). AGT-902 references 11 source views joined into a per-account composite. Without formal definitions, the brains were theory. Without compression discipline (10× typical), brain costs would explode.
Contract elements
Source table, owning service, projection rule, filter rule, refresh cadence, staleness threshold, token budget, used-by list, backwards compatibility rules. Every view in the catalogue has all 8 elements specified.
Compression discipline
Three patterns: top-N + aggregate (most cross-population views); time-windowed slice (time-series views like UsageMeteringLog, MetricsCalc); per-account composite (precomputed denormalized snapshot for AGT-902). The composite is the operationally critical pattern — constructing it at brain query time would blow latency and token budget.
Staleness model
Every view exposes view_metadata.last_refresh_timestamp + staleness_threshold_hours + is_stale. Brain reads is_stale first; if TRUE, sets BrainAnalysisLog.data_staleness_acknowledged = TRUE and surfaces in narrative_output. Per-component staleness for the composite (AGT-902 may proceed with fresh components, surface specific stale dimensions).
Backwards compatibility rules
Field additions: non-breaking. Field removals: breaking, requires coordinated brain prompt deploy. Projection rule changes: case-by-case. Refresh cadence faster: non-breaking (cost may rise). Cadence slower: possibly breaking (staleness threshold may need adjustment). Cadence + threshold changes logged in changelog.
Cost economics
Materialized view storage trivial. Daily refresh compute modest (piggybacks on existing AGT-XXX nightly jobs). Per-account composite refresh slightly higher (per active account, daily). View-driven brain cost reduction ~10× vs raw-table reads — saves >~$200/month at expected volumes. Net: views save more than they cost.
Impact
L9 brains have a real, formal read contract. Implementation work substantial — 10 materialized views + 1 composite refresh job. Eval linkage: BrainEvalLog.brain_view_contracts_hash detects view contract changes between runs; forces re-eval before brain promotion.
AGT-302L3 Cadence Coordinator — v27 ripple formalized in spec
RippleCharter update
v27 SalesPlayLibrary execution + play_id/originating_proposal_id lineage now formalized in AGT-302's own spec (was only in changelog ripple entry). Wire 4 documented alongside Wire 3 (L2 ABM backstop).
Change
AGT-302 spec gains "L9 wire item (v27)" section parallel to existing L2 wire item. Documents two integrations: (1) reading active plays from SalesPlayLibrary alongside existing AGT-301/AGT-203 sequence sources, (2) writing play_id and originating_proposal_id to CadenceEventLog for cohort retrospective lineage.
Why now
v27 ripple was logged in changelog but never written into AGT-302's own spec. AGT-302 is the highest-traffic execution service in the OS — its spec needs to reflect every coordination wire, including the L9 one. Spec drift is itself a failure mode; fixing it before more services build against AGT-302 documentation.
Existing behavior preserved
All AGT-302 v22 behavior unchanged: 3-touch/week cap, ABM backstop, suppression compliance gates. Brain-co-designed plays do not bypass coordination guards. Both new CadenceEventLog columns are nullable; existing rows have NULL on both; existing AGT-301/AGT-304/AGT-203 paths continue without writing the new fields.
Read by
AGT-303 Cadence Intelligence reads CadenceEventLog including new fields — brain-co-designed plays appear in cadence performance analytics with play_id grouping (per AGT-303 v27 brain-ready view extension referenced in Brain_Ready_Views_Contract).
Impact
Documentation hygiene — spec now matches what changelog claims is true. Cohort retrospective is queryable — the CadenceEventLog originating_proposal_id column was theoretical until AGT-302's spec confirmed write behavior.
v28 → v29 Tier 3 Specialist Tools · 4 tools specced · Closes Domain 3 + Domain 1 gaps + UBB consumption forecasting Tier 3 = stateless callable LLM functions. Not a layer (no schema ownership, no cadence). Functions an agent can call. First wave covers the largest cognition gaps from the v26 architecture eval: API-doc translation, dev-persona enrichment, play composition, consumption forecasting.
Tier 3Tools New artifact category — Specialist Tools
New category
Stateless callable LLM functions. Distinct from layered services and agents — no schema ownership, no cadence, no approval gates. Lighter governance than Tier 1/2; pure I/O contracts plus eval discipline.
Change
New artifact category established. Tools live alongside the layered services and agents but follow different rules. Stored in tools/ directory. Indexed by Tools_Index.html.
Why category, not layer
Tools are pure functions: stateless, no table ownership, no cadence, no approval gates. Treating them as layered components creates wrong governance overhead; treating them as opaque LLM calls creates wrong audit posture. Tier 3 splits the difference — callable contracts with eval discipline, but no layer-integration burden. Per the v26 architecture eval recommendation.
Tool contract
Every Tier 3 tool spec must define: purpose (1 sentence), input schema (strict JSON), output schema (strict JSON; schema changes are breaking), model tier (Haiku default unless task demands otherwise), called-by list, cost ceiling (per-call + monthly), eval criteria, declared failure modes.
Invocation patterns
Brain calls tool during query (most common); service calls tool as enrichment (e.g., AGT-201/AGT-204 calling TOOL-002); operator calls tool directly via workspace UI; tool chains (brain orchestrating multiple tools in sequence).
Aggregate cost target
First-wave default budgets total ~$1,200/mo across the 4 tools. Combined with Tier 2 brain budgets (~$1,500/mo), reasoning layer total ~$2,700/mo. Tracked in single dashboard; alerts at 75% per tier.
Eval discipline
Tools have own eval criteria, lighter than brain harness — typically 10–15 questions per tool. Schema compliance is hard requirement (100%); hallucination/grounding metrics are tool-specific. Tool eval results land in BrainEvalLog with flag distinguishing tool runs from brain runs.
Impact
Lightweight extension mechanism for narrow cognition tasks. New ongoing maintenance — tool eval suites + per-tool model selection discipline. No schema burden — tools don't write tables, so adding/removing a tool has zero schema migration cost.
TOOL-001Tier 3 API-doc → Sales-play Translator
New tool
Reads API documentation, produces 1–3 candidate sales play definitions for technical buyer personas. Output goes to SalesPlayLibrary as draft. Closes Domain 3 (API/dev-led GTM) gap.
Change
New tool: TOOL-001_API_Doc_Play_Translator.html. Closes the largest single gap from the v26 architecture eval (Domain 3 was 20% covered).
Inputs
API doc (OpenAPI spec / markdown URL / raw markdown), product context, current ICP summary, current active plays (for de-duplication), optional buyer persona hint, max plays constraint.
Outputs
Up to 3 candidate plays with name, hypothesis, target_buyer_persona, api_capabilities_referenced (with doc citations), target_definition outline, suggested_cadence_outline, success_criteria_outline, confidence_self_rating, ungrounded_assumptions (hard requirement — assumptions must be separated from API-grounded claims).
Hard rules
No fabricated capabilities (eval enforces 0% hallucinated capability rate). Refuses if input docs too thin (returns 0 candidates with structured reason rather than fabricate). De-duplicates against current_active_plays.
Called by
AGT-901 Pipeline Brain (most common, after product launches), AGT-902 Account Brain (account-specific variant), RevOps direct via workspace UI.
Cost
Sonnet (synthesis-heavy; Haiku quality below threshold). 50K input + 5K output budget. ~$0.20-0.30 per call. Monthly cap $300/mo (~1,000 calls). Frequency expectation low.
Impact
First mechanism in the OS for translating product capabilities into sales plays. Often chains into TOOL-003 (Sales Play Composer) when a candidate is selected for refinement.
TOOL-002Tier 3 Dev-Persona ICP Enricher
New tool
Augments AGT-201 ICP scoring with developer-persona signals (job postings, GitHub activity, technographics, product telemetry). Bounded ±15pt adjustment on tech_stack and intent dimensions only. Closes dev-buyer side of Domain 3 gap.
Change
New tool: TOOL-002_Dev_Persona_ICP_Enricher.html. AGT-201's 6-dimension scorer materially under-weights developer-buyer signal; this tool augments without replacing.
Inputs
Account basics, technical signals (job postings, GitHub org activity, technographic data from AGT-204, product telemetry from UsageMeteringLog), current ICP score breakdown.
Outputs
dev_persona_score (0–100 composite), dev_persona_tier, buyer_persona_classification (developer / platform_team / compliance_engineer / data_engineer / non_technical_buyer / unclear), signals_present breakdown, key_drivers with citations, agt_201_recommended_adjustment (capped ±15pt on tech_stack and intent dimensions only), data_completeness assessment.
Hard rules
Bounded magnitude (cannot single-handedly flip Tier 3 to Tier 1). Sparse-data graceful degradation (returns "low" or "medium" confidence on sparse inputs, not "high"). Honest "unclear" classification when signals conflict. Hallucinated signals = 0% (key_drivers must cite real input fields).
Called by
AGT-201 ICP Scorer (on account update events with technical signals); AGT-901 Pipeline Brain (cross-account dev-persona analysis); AGT-204 Lead Enrichment (as enrichment subroutine).
Cost
Haiku (classification task; Sonnet tested no measurable lift). 10K input + 1K output budget. ~$0.01-0.02 per call. Monthly cap $200/mo (~10,000 calls). Frequency expectation high — called on AGT-204 enrichment cadence.
Impact
Developer-led GTM motion finally has a system home. AGT-201 keeps existing 6-dimension model; this is parallel signal stream. Backwards-compatible — accounts without enrichment scored exactly as before.
TOOL-003Tier 3 Sales Play Composer
New tool
Composes structured play definitions from a hypothesis + Tier 1 brain-ready views. Output is a draft for SalesPlayLibrary. Hard rule: must use existing levers only — never proposes capabilities AGT-302 doesn't support. Closes Domain 1 "create plays, not just execute" gap.
Change
New tool: TOOL-003_Sales_Play_Composer.html. The lever that turns a brain proposal from "we should do something here" into "here's the candidate play, refine and approve."
Inputs
Hypothesis from calling brain, scope (segment / account_specific), context_views (ICP profile, win/loss pattern, account priority distribution, current active plays, VoC themes), constraints (must_use_existing_lever_only=true, max_touch_count per AGT-302 limits, optional anchor_to_tool_001_output for chain), originating_proposal_id from BrainAnalysisLog.
Outputs
draft_play (full SalesPlayLibrary draft schema: name, hypothesis, target_definition, suggested_cadence with step_outline, success_criteria with retire_signal), lever_grounding (every play element traces to existing agent spec section), duplication_check (overlap flagging vs current active plays), self_assessed_promotion_likelihood (calibrated honestly, not optimistically), ungrounded_assumptions.
Hard rules
100% lever-grounding required (ungrounded elements omitted, not fabricated). Schema compliance 100% (validates against SalesPlayLibrary draft schema). Must-use-existing-lever-only constraint non-negotiable; cannot propose capabilities AGT-302 doesn't support. Honest self-rated promotion likelihood — "low" rated drafts must actually have lower promotion rate than "high" by >20pp.
Called by
AGT-901 Pipeline Brain (most common). AGT-902 Account Brain (for scope=account_specific). Chains from TOOL-001 when API-doc candidate gets selected for refinement. RevOps direct via workspace UI.
Cost
Sonnet (composition with structural constraints; Haiku tested but lever-grounding accuracy below threshold). 30K input + 4K output budget. ~$0.15 per call. Monthly cap $300/mo (~2,000 calls).
Stateless write contract
Tool is stateless — calling brain writes the SalesPlayLibrary draft row with the tool's output. Brain owns the write because brain owns the BrainAnalysisLog row anchoring lineage. Promotion gate (draft → under_review → active) unchanged from SalesPlayLibrary v27 spec.
Impact
Closes Domain 1 gap from v26 eval. Brain proposals + TOOL-003 composition + human co-definition + AGT-302 execution = end-to-end "create plays, not just execute" pipeline.
TOOL-004Tier 3 Consumption Forecasting / Runway Predictor
New tool
Reads UsageMeteringLog trailing 90–180 days for one (account, SKU). Predicts overage timing with confidence interval, characterizes trend (linear / exponential / seasonal / cliff / flat). Closes UBB-specific gap. LLM for characterization; numerical forecasting in code (not LLM gut feel).
Change
New tool: TOOL-004_Consumption_Forecasting.html. Today the system can detect overage has happened (AGT-503 fires on threshold breach) but cannot predict when overage will happen or distinguish trend from spike.
Inputs
account_id, sku_id, sku_type, metering_history (90–180 days from UsageMeteringLog brain-ready view, audit_status filtered to verified+pending_recon), contract context (start/end/renewal dates, current period commit, trailing overage count), forecast_horizon_days (default 60, max 180).
Outputs
forecast_summary (predicted_overage_date with [low, high] confidence interval), trend_characterization (primary_pattern enum, growth_rate, volatility_score, seasonality_detected, anomalies_detected), interpretation_for_caller (is_likely_real_expansion / is_likely_one_time_spike / is_likely_seasonal_recurrence + rationale), data_quality assessment.
Pattern taxonomy
5 patterns enumerated: linear (steady growth), exponential (accelerating), seasonal (repeating cycle), cliff (sudden discontinuous step), flat (no trend). Pattern characterization is the most operationally important output — spike-vs-trend differentiation is the eval-targeted dimension at ≥ 80% threshold.
Hard rules
Refuses if < 30 days of history. Seasonality requires ≥ 2 full cycles (eval enforces 0% false-positive seasonality). Confidence intervals scale with history length (no narrow intervals on short data). LLM does not invent forecast numbers — numerical forecasting runs in code (exponential smoothing + regression with seasonality decomposition); LLM produces structured summary + interpretation.
Called by
AGT-902 Account Brain (most common — "real expansion or spike?" use case). AGT-503 Expansion Trigger (optional augment to existing 5-signal scoring — downgrades signal weight on cliff/spike pattern). AGT-402 Forecast Adjuster (weights expansion ACV probability in bottoms-up forecast by trend strength).
Cost
Haiku (LLM portion is interpretation only; numerical work in code). 15K input + 2K output budget. ~$0.02 per call. Monthly cap $400/mo (~20,000 calls). Highest-frequency tool of the four.
Impact
Closes UBB-specific gap from v26 eval. For a usage-based business: the difference between proactive expansion conversation (timed before overage hits) and reactive damage control (overage already painful for customer).
v27 → v28 Brain Eval Harness · 30-question retrospective + scoring rubric + BrainEvalLog schema Pre-launch gate for AGT-901 + AGT-902. Per the v26 architecture directive: eval before launch, eval forever. Quarterly drift detection + on-demand triggers. The CFO defense, written down.
EVAL HarnessL9 Methodology + Rubric (new artifact)
New artifact
7-dimension scoring rubric with 3 hard criteria + 4 soft criteria. Pre-launch hard gate. Quarterly cadence + on-demand triggers. Independence-required reviewer. Decision logic computed via trigger.
Change
New artifact: Brain_Eval_Harness.html. Established methodology, scoring rubric, run cadence, human review process, drift detection signals, and pre-launch checklist.
Scoring rubric
7 dimensions. 3 hard criteria (any failure = harness fail): hallucination rate ≤ 2%, staleness recognition = 100%, lever-mapping correctness = 100% (AGT-902 only). 4 soft criteria: source citation rate ≥ 95%, diagnosis accuracy ≥ 0.70 mean, confidence calibration in healthy bands, narrative coherence ≥ 4.0/5.0 mean.
Pass/fail logic
Computed via trigger from aggregated_scores. pass (all hard + all soft) / pass_with_notes (all hard + most soft, no soft > 15pp below threshold) / conditional (all hard but soft significantly below; brain may run shadow mode) / fail (any hard criterion fails). Decision is policy, not negotiation — manual override blocked.
Run cadence
Pre-launch (one time per brain, hard gate) · quarterly (drift detection) · material prompt change (hard gate) · new model promotion (comparison gate) · on-demand (operator suspicion, diagnostic only).
Reviewer independence
Hard requirement: human reviewer must not be the person who tuned the brain prompt. Trigger validates against an access-controlled prompt-tuner registry at write time. Independence preserves the eval as a measurement, not a target.
Cost of running
~$3/run in tokens (30 brain calls × Sonnet pricing). ~$25/yr token at quarterly cadence + 2–3 ad-hoc runs. ~24 reviewer-hours/year. Total < $50/yr token + ~40 hours analyst time. Cheap insurance.
Failure modes of the eval itself
Documented: catalog over-fitting (eval easy questions); brain "memorization" of eval (rotate question wording, hold private spot-check set); reviewer drift (calibrate periodically with golden-set questions). The eval is an artifact that needs maintenance, not a static check.
Impact
Pre-launch gate established. AGT-901 + AGT-902 cannot go to production without passing the harness. New ongoing operational burden — quarterly run + catalog maintenance + reviewer independence enforcement. Eval is the CFO defense — when costs come up, point at measurable accuracy gains. When auditors ask how brain quality is maintained, point at the trend.
EVAL CatalogL9 30-question retrospective catalog (new artifact)
New artifact
10 questions for AGT-901 + 20 for AGT-902 across 8 question types. 5 questions deliberately use stale fixture data to test staleness recognition. Anonymization discipline mandatory.
Change
New artifact: Brain_Eval_Question_Catalog.html. Scaffolded with 6 detailed question templates and 24 one-line summaries. RevOps owns filling in the remaining detailed templates with real anonymized historical fixtures before pre-launch.
Question split
AGT-901 (10): 4 plan diagnosis · 3 coverage gap · 2 quarterly play retrospective · 1 forecast bias attribution. AGT-902 (20): 8 churn diagnosis · 6 expansion qualification · 4 hand-off briefing · 2 QBR narrative. Volume favors AGT-902 because per-account ground truth is easier to construct retrospectively and the brain is higher-volume in production.
Stale-fixture coverage
5 of 30 questions use deliberately stale Tier 1 source fixtures: EVAL-Q02 (forecast accuracy stale), Q07 (priority score stale), Q12 (composite view stale), Q24 (usage log stale), Q28 (conv intelligence stale during hand-off). All 5 must score 100% on staleness recognition for the harness to pass.
Question construction principles
4 properties of a good retrospective question: ground truth exists (knowable in retrospect); source data is reconstructible (Tier 1 tables can be replayed to time-of-question state); multiple defensible answers possible but some clearly better; spans the brain's actual use cases. Hard rule: if source state can't be reconstructed, the question is not eligible — otherwise the eval measures hindsight bias not brain quality.
Anonymization
Mandatory before catalog commit. Account names fictionalized with consistent placeholders (mapping in access-controlled separate file). ACV/consumption rounded within 10%. Personnel names removed/fictionalized. Verified by second reviewer. BrainEvalLog 7-year retention makes anonymization a privacy + contract requirement, not a polish step.
Catalog maintenance cadence
Quarterly: rotate 3–5 questions (retire aged ground truth, add questions sourced from recent operational complaints). On material GTM context shift: refresh to span new ICP / product / segment. On model promotion: spot-check 5 questions on the new model, expand to full harness if divergence material.
Impact
Concrete pre-launch artifact — AGT-901 + AGT-902 launch is now blocked on a measurable, defined deliverable rather than vibes. Ongoing maintenance commitment — ~16 hours/year refreshing questions. Catalog is intentionally a scaffold, not finished — structure is set; RevOps fills in the historical detail before the first run.
BrainEvalLogL9 Production Schema (new) — 2 tables
New artifactSchema
BrainEvalLog (one row per run) + BrainEvalQuestionScore (one row per question per run). Append-only. Decision computed via trigger. Reviewer independence enforced. 7-year retention. Brain Agents have no read access by design.
Change
New schema artifact: BrainEvalLog_Schema.html. Two tables — one row per run (aggregated), many rows per question per run (drill-down).
BrainEvalLog fields
19 fields. Key: run_trigger, brain_under_test, model_id, system_prompt_hash (joinable to BrainAnalysisLog), brain_view_contracts_hash (detects view drift between runs), question_catalog_version, reviewer_user_id + reviewer_independence_verified (trigger checks against prompt-tuner registry), aggregated_scores (JSONB across 7 dimensions), hard_criteria_passed, decision (computed via trigger), decision_rationale, calibration_notes, token_cost_usd, reviewer_hours_spent.
BrainEvalQuestionScore fields
17 fields per (run, question). Key: brain_analysis_id (FK to actual BrainAnalysisLog row produced for this eval question — full drill-down), uses_stale_fixture flag, citation_rate (auto-scored), hallucinations_detected + total_claims (human-reviewed), staleness_correctly_recognized (NULL if not stale-fixture; T/F otherwise), diagnosis_accuracy_score, lever_mapping_violations (AGT-902 only), confidence_distribution, narrative_coherence_rating (1–5), reviewer_comments.
Triggers
Reviewer independence trigger: validates reviewer_user_id is not in the prompt-tuner registry at write. Decision logic trigger: computes BrainEvalLog.decision from aggregated_scores via the rubric — manual override blocked. Append-only enforcement: UPDATE blocked except for completed_at; DELETE blocked.
Brain Agents have no read access
Deliberate. The brain should not learn what the eval expects — the eval is a measurement of general capability, not a target the brain optimizes for. RLS blocks Brain Agent read access to both eval tables. RevOps + auditors + the eval harness runner service are the only readers.
Dashboard queries enabled
Per-brain quarterly trend; most-failing questions across runs; stale-fixture pass rate audit; reviewer calibration distribution; cost per run; production-vs-eval prompt drift (join on system_prompt_hash). All readable from the two tables without expensive joins.
Retention
7 years. Hot 0–24mo, cold 24mo–7yr. Parallels BrainAnalysisLog and UsageMeteringLog — eval results that informed brain-influenced decisions need the same retention floor.
Impact
Eval results are durable, queryable, auditable. New triggers to maintain — reviewer-independence trigger requires keeping the prompt-tuner registry accurate. Production-vs-eval drift detection becomes a real query, not a hope — system_prompt_hash joins detect when production is using a prompt that the eval hasn't validated.
v26 → v27 L9 Reasoning Layer · AGT-901 Pipeline Brain + AGT-902 Account Brain · 2 new schemas + Tier 1 ripples First true Agents in the OS. LLM-native, operator-invoked, read Tier 1 services, never write canonical data. Brain proposes, humans co-define and approve, services execute.
L9 LayerL9 Reasoning · Tier 2 Agents
New layer
New layer concept distinct from L1–L8 services. Hosts LLM-native Brain Agents. Operator-invoked, never on cadence. Strict no-write-to-canonical contract.
Change
New layer L9 — Reasoning. First time the system has agents in the LLM-native sense (distinct from the 40 GTM Services in L1–L8). L9 components are operator-invoked, not cadenced; they read Tier 1 service tables via brain-ready views and produce analysis with full source-trace metadata. No canonical writes from L9 — ever. The architectural commitment from the v26 evaluation made concrete.
Why now
Per the v26 architecture eval: brains require trustworthy Tier 1 data. L8 was promoted to Built first (UsageMeteringLog production schema + AGT-804 v26) so L9 reads from a real foundation rather than a stub. With L8 anchored, L9 can be specced without producing fake-confident output.
Layer color + position
Color #D4A84C (gold) — chosen to visually distinguish L9 from the eight service layers. Positioned in the layer stack as the topmost layer, reading from L1–L8 below.
Naming
Components in L9 are Brain Agents, not GTM Services. The renaming from v26 reserved "Agents" specifically for this layer. The renderer labels L9 cards with "agents" while L0–L8 cards remain "services."
Cadence policy
No L9 component runs on cadence by default. Invocation paths: operator query (chat-style), narrative jobs (called by AGT-704 / AGT-603 for narrative sections of MBR/QBR/QBR-prep), explicit batch triggers (quarterly play refresh, AE/CSM rotation hand-off, optional renewal-prep batch over T-90 cohort).
Cost containment
Default model: Sonnet tier. Opus reserved for explicit deep-analysis opt-in. Haiku for narrow Tier 3 tools (forthcoming). Prompt caching enabled. Brain-ready views (10× input compression typical) instead of full table dumps. Per-query budget caps + monthly tier alerts at 75%.
Impact
Open-ended operator queries become tractable. "Why is Q3 commercial soft?" no longer requires a 4-day RevOps analyst project. New audit surface. BrainAnalysisLog must demonstrate source-trace integrity to the auditor — a new capability but a known one. Tier 1 services unaffected at execution time. If L9 is offline, L1–L8 continues to run normally.
Trigger
Step 3 of the four-step build sequence in the v26 architecture plan: deterministic backbone first (L8), framing rename second, Tier 2 agents third, eval harness fourth.
AGT-901L9 Pipeline Brain
New agent
Cross-functional pipeline reasoning. Diagnoses plan-vs-actual, drafts segment-targeted plays. Reads 9 Tier 1 brain-ready views. Operator-invoked.
Change
First Brain Agent in L9. Answers cross-functional pipeline questions: why a segment is off plan, what play would close the gap, which existing levers apply. Reads brain-ready views from AGT-702 (MetricsCalc), AGT-401 (Opportunities), AGT-402 (ForecastLog), AGT-201 (Accounts), AGT-206 (AccountPriorityScore), AGT-105 (CapacityPlan), AGT-703 (WinLossLog), AGT-303 (CadenceEventLog), AGT-604 (VoCSynthesisLog).
Use cases
Plan diagnosis (operator query: "Why is segment X off plan?"); coverage gap proposal; quarterly play refresh; WBR/MBR narrative job (AGT-704 invokes); anomaly explanation after AGT-702 breach alert.
Write contract
BrainAnalysisLog (own log) and SalesPlayLibrary (drafts only — never active). May write WBR/MBR/QBR narrative sections per AGT-704 v26 charter. Never writes Opportunities, ForecastLog, MetricsCalc, QuotaStore, CompPlans, RevenueRecognitionLog, ABMPlaybook directly.
Promotion gate
draftunder_review: human pickup. under_reviewactive: SLM + RevOps joint approval. Hard volume cap: 3–8 active plays per segment per quarter (configurable, default 5). Cap enforced at write time on transition to active.
Eval criteria
Source citation rate ≥ 95%; hallucination rate ≤ 2%; staleness recognition 100% (hard requirement); diagnosis accuracy ≥ 70% on 30-question retrospective harness; play survival rate ≥ 30%; cohort outcome lift tracked quarterly.
Impact
Closes Domain 1 gap from v26 eval — system can now create sales plays for human co-definition, not only execute them. No behavioral change to Tier 1 services until a brain-drafted play is promoted to active by humans, at which point AGT-302 executes per its existing v25 spec.
AGT-902L9 Account Brain
New agent
Per-account synthesis across L4+L5+L6. Reads 11-source composite brain-ready view. Maps proposed actions to existing levers — never invents new ones.
Change
Per-account counterpart to AGT-901. Where AGT-901 reasons across pipeline-wide signals, AGT-902 reasons across the full L4 + L5 + L6 picture for a single account. Reads a composite per-account brain-ready view spanning 11 Tier 1 sources (CustomerHealthLog, ChurnRiskLog, ExpansionLog, UsageMeteringLog, Opportunities, ConvIntelligence, QBRLog, OnboardingLog, TechnicalMilestoneLog, PaymentEventLog, Accounts).
Use cases
"What's the move?" account queries; AE/CSM hand-off briefing on rotation; renewal risk diagnosis after AGT-502 surfaces a Critical/High account; expansion qualification after AGT-503 fires; QBR prep narrative section (AGT-603 invokes).
Proposed-action taxonomy
All actions map to existing Tier 1 levers — no invented actions. Enum: pull_qbr_forward, open_expansion_play, brief_new_ae_or_csm, customer_communication, escalate_to_slm, recommend_human_query, none. Eval enforces 100% taxonomy compliance.
Account synthesis signature caching
Within a freshness window, repeat queries on the same account hit cache. Saves cost on iterative meeting-prep workflows where someone asks 3+ follow-ups about the same account.
Cost sizing
Conservative: 30 queries/day, 30K input + 3K output per query, Sonnet ≈ $4/day ≈ $120/mo. Per the v26 cost model.
Impact
Cross-source per-account synthesis becomes routine. Today this requires the AM/CSM to mentally union 11 source systems. Acceptance rate is a calibration signal, not a quality signal — too high suggests reading leadership/AM bias, too low suggests missing context. Track outcome lift cohort-wise.
BrainAnalysisLogL9 Production Schema (new)
New artifactSchema
Canonical log of every Brain Agent query. Source-trace metadata for audit-grade reasoning. Append-only. Single-writer-per-row. proposal_id lineage anchor.
Change
New schema artifact: BrainAnalysisLog_Schema.html. The mechanism by which L9 earns audit-grade reasoning out of a non-deterministic substrate.
Field highlights
22 fields. Key: proposal_id (lineage anchor for cohort retrospective), writer_agent_id (single-writer enforced), sources_read (JSONB array with last_refresh_timestamp per source), narrative_output (with inline [src:N] citations), confidence_flags (high_confidence / multi_source / inference / speculation), data_staleness_acknowledged (hard CHECK requires staleness disclosure phrase in narrative when TRUE), cost_usd_estimate (audit-stable).
Triggers enforced
Staleness disclosure trigger (stale + no disclosure phrase = reject); source citation trigger (every [src:N] must reference a real source_index); action enum trigger (proposed_actions[].action_type must be in taxonomy). Append-only via row-level security — UPDATE and DELETE blocked.
proposal_id lineage
Every brain output gets a stable proposal_id. Downstream actions (promoted plays, executed sequences, scheduled QBRs, sent comms) inherit it. At quarter end, cohort retrospective filters Tier 1 records on WHERE originating_proposal_id IS NOT NULL for the brain-influenced cohort. Pattern-level analysis, not per-deal attribution — per the v26 evaluation's "cohort-level retrospective" model.
Retention
7 years. Hot 0–90d, warm 90d–18mo, cold 18mo–7yr. Parallels UsageMeteringLog and RevenueRecognitionLog.
Access
Brain Agents read own writes only. RevOps full read. Auditors full read on audit windows. AM/CSM/AE: own queries + assigned accounts. Tier 1 services intentionally have no read access — preserves the auditability invariant that canonical metrics never depend on brain outputs.
Impact
Audit reproducibility — given any row, auditor can re-fetch cited sources at cited timestamps and re-evaluate defensibility. New trigger maintenance burden — staleness disclosure trigger and source citation trigger must be kept aligned with prompt evolution. Eval ground truth — the 30-question retrospective harness reads BrainAnalysisLog directly.
SalesPlayLibraryL9 Production Schema (new)
New artifactSchema
Workspace where Brain Agents draft sales plays for human co-definition. Promotion-gated state machine. Hard volume cap on active plays per segment.
Change
New schema artifact: SalesPlayLibrary_Schema.html. Operationalizes the v26 commitment: brains propose, RevOps and sales leaders refine into a small curated set, AGT-302 executes.
State machine
4 states with enforced transitions. draft (Brain writes) → under_review (human pickup) → active (SLM + RevOps joint approval) → retired (terminal). No backward transitions. Trigger validates allowed transitions only.
Volume cap
Hard cap enforced at trigger level on transition to active. Default 3–8 per segment per quarter (configurable per segment via SalesPlayLibraryConfig in L0; default 5). Account-specific plays: 1 active per account, hard. Cap met → activation rejected with error suggesting retire-an-existing-play first. No silent overflow.
AGT-302 integration
AGT-302 reads SalesPlayLibrary WHERE state = 'active' at sequence-generation time. Drafts and under-review plays invisible to AGT-302 by design. AGT-302 writes play_id and originating_proposal_id to CadenceEventLog for every event generated under a brain-co-designed play. v27 ripple change to AGT-302's schema (two new fields).
Brain calibration feedback
Three signals: promotion rate (active / draft ratio per quarter per brain — eval target ≥ 30%), edits-during-review volume (light edits = brain calibrated; heavy edits = drafts are starting points), retrospective outcomes vs. success_criteria (cohort lift quarterly). Brain reads retrospective_outcomes in next play-refresh batch.
Impact
Closes Domain 1 "create plays, not just execute" gap from the v26 evaluation. RevOps + SLM workflow change — explicit promotion gate process, joint approval, quarterly retrospective on cohort outcomes. AGT-302 v25 behavior preserved — touch caps, suppression, ABM backstop still apply on top of any active play execution.
Tier 1 rippleL9 originating_proposal_id columns added across 6 services
RippleSchema
AGT-203 ABMPlaybook · AGT-302 CadenceEventLog · AGT-503 ExpansionLog · AGT-603 QBRLog · AGT-405 OpportunityBriefLog · AGT-504 CommBriefLog. All gain a nullable originating_proposal_id column for cohort retrospective lineage.
Change
Six Tier 1 services gain a nullable originating_proposal_id column. Always nullable — actions taken without brain involvement carry NULL. Cohort retrospective queries filter on WHERE originating_proposal_id IS NOT NULL for the brain-influenced cohort. AGT-302 also gains a play_id column for direct play-to-event lineage.
Affected services
AGT-203 ABMPlaybook (play promoted from SalesPlayLibrary carries proposal_id) · AGT-302 CadenceEventLog (sequences executing under brain-co-designed plays carry proposal_id + play_id) · AGT-503 ExpansionLog (expansion plays initiated via brain proposal) · AGT-603 QBRLog (pull-forwards triggered by brain proposal) · AGT-405 OpportunityBriefLog (briefs from brain-recommended meetings) · AGT-504 CommBriefLog (comms approved after brain recommendation).
Backwards compatibility
Fully backwards-compatible. Existing rows have NULL originating_proposal_id — they predate L9 and were not brain-influenced. Existing service behavior unchanged. New rows populate the field only when triggered by a brain-promoted play or brain-proposed action.
Why
Per the v26 evaluation: in enterprise sales-led motion you cannot prove the brain caused a deal loss; what you can do is compare cohort outcomes between brain-influenced and non-brain-influenced actions over a quarter. The originating_proposal_id column is the load-bearing piece of that measurement.
Impact
Six service schemas updated. Migration is straightforward (nullable add). No reads change in those services. Brain output never affects how Tier 1 services compute, only how RevOps reports retrospective outcomes.
AGT-603L6 Charter extension — narrative path called by AGT-902
RippleCharter update
AGT-603 QBR Prep gains a narrative path: AGT-902 may write per-account narrative sections of QBR prep documents. AGT-603 still owns the artifact and gates publication. Metric sections remain off-limits to brains.
Change
AGT-603 charter extended (parallels the AGT-704 v26 charter extension): AGT-902 may write per-account narrative sections of QBR prep documents. AGT-603 still owns the artifact, gates publication, and pulls fresh from canonical Tier 1 metric tables for any number-bearing section.
What's permitted
AGT-902 may write narrative content describing account history, relationship dynamics, recent ConvIntelligence trajectory, open commitments — into designated QBRLog narrative fields. Always with source-trace metadata.
What's NOT permitted
AGT-902 may never write metric sections of QBR prep — health score, churn risk tier, expansion ACV, usage trends. Those pull fresh from AGT-501 / AGT-502 / AGT-503 / AGT-804 at QBR generation time. Same architectural commitment as AGT-704.
Impact
QBR prep narrative quality lift — synthesis across 11 sources is now automatic for AM/CSM consumption. Metric integrity preserved — every number in QBR still traces deterministically to Tier 1 service tables.
v25 → v26 AGT-804 promoted to Built · UsageMeteringLog production schema published L8 begins its build phase. Per the architecture decision to ship the deterministic backbone before any reasoning/brain layer, AGT-804 (the audit-critical agent) and its UsageMeteringLog dependency are the first L8 promotion.
AGT-804L8 Revenue Recognition
Status promotionHardened spec
Specced → Built. Production controls + edge cases + restatement workflow added. ASC 606 source data contract now references the UsageMeteringLog production schema.
Change
AGT-804 promoted from Specced (v25) to Built (v26). Spec hardened with two new sections: Production controls & audit (8 controls covering reconciliation, deferred-revenue tie-out, MAX(record_version) discipline, prior-period adjustment flagging, milestone sign-off gating, 7-year retention, restatement workflow) and Edge cases & failure modes (8 scenarios: reconciliation failure, prior-period correction, mid-term amendment, customer dispute, SKU type change, mid-term churn, source replay, ingestion outage). UsageMeteringLog section rewritten to reference the new production schema document instead of duplicating field-level detail.
Why now
The architecture evaluation (May 2026) called L8 the gating dependency for any Tier 2 / brain layer. The brain reading from a stub UsageMeteringLog produces fake-confident output — the worst possible failure mode for audit-sensitive metrics like Magic Number, NRR, and CAC Payback. Building reasoning on top of an unbuilt L8 is incoherent. AGT-804 goes first within L8 because it is the audit-critical one: ASC 606 recognized revenue feeds AGT-702's MetricsCalcLog, which feeds AGT-704's MBR/QBR.
Production controls
Reconciliation against product DB (monthly tolerance ±0.01%); recognition method immutable within a period; deferred revenue ties to balance sheet within ±$1; MAX(record_version) discipline enforced in agent code; prior-period adjustments flagged distinctly so AGT-702 can sum them separately; milestone billing requires customer_sign_off = TRUE (IE-only completion does not unlock recognition); 7-year retention via cold storage tier; restatement workflow defined for auditor-driven closed-period changes.
Restatement workflow
If a closed-period figure must be restated, AGT-804 emits a RevenueRecognitionLog row with restatement_flag = TRUE and reason. AGT-702 surfaces restatements as a separate line in MBR/QBR rather than blending them into current-period activity. Preserves auditability + avoids polluting trend lines.
Failure mode: ingestion outage
If UsageMeteringLog ingestion is offline >24h during a close period, AGT-804 holds the close pending data catch-up. No partial-data recognition. Cascades to AGT-702 staleness and AGT-704 MBR staleness gate per existing AGT-704 spec.
Schema
No schema additions to RevenueRecognitionLog or DeferredRevenueSchedule beyond the v25 spec. Two existing fields gain explicit semantics: prior_period_adjustment (BOOL, set TRUE on rows derived from UsageMeteringLog corrections to closed periods) and restatement_flag (BOOL, set TRUE on auditor-driven closed-period restatements). UsageMeteringLog itself is now governed by a separate production schema document — see related schema entry below.
Impact
First L8 agent in Built status. Magic Number, NRR, and CAC Payback now have a fully specced canonical source. AGT-704 staleness cascade gains a new branch — UsageMeteringLog reconciliation failure now blocks AGT-702 metric recompute, which blocks AGT-704 MBR/QBR. No behavioral changes for AGT-501 / AGT-503 / AGT-802 / AGT-402 — they continue to read UsageMeteringLog as before; the production schema doc formalizes contracts already in use.
Trigger
Architecture evaluation document (May 2026) explicitly directed L8 buildout before adding any reasoning/brain layer. Step 1 of the four-step build sequence in the architecture plan.
UsageMeteringLogL8 Production Schema (new artifact)
New artifactSchema
Standalone production schema document. Full field definitions, idempotency model, quality checks, reconciliation tolerances, retention tiers, go-live checklist.
Change
New schema artifact: UsageMeteringLog_Production_Schema.html. Standalone document because the table is consumed by four agents (AGT-804 / AGT-802 / AGT-503 / AGT-501) for materially different decisions (recognition vs. billing vs. expansion vs. health), making data quality on this one table the highest-leverage data-quality investment in L8. Embedding the production contract inside any single agent spec would obscure the cross-cutting nature of the dependency.
Field definitions
26 fields with types, nullability, constraints. Natural key is (account_id, sku_id, period_start, period_end, source_system, ingest_event_id, record_version). Surrogate PK is usage_id. New fields beyond the v25 spec: sku_type (consumption/seat/hybrid), period_granularity, unit_price_usd + overage_unit_price_usd (captured for audit reproducibility), seat_utilization_pct (computed), source_system, ingest_event_id, ingest_batch_id, received_at, effective_at, record_version, prior_version_id, correction_reason, audit_status.
Idempotency model
UPSERT on (source_system, ingest_event_id). Source replay is a no-op. Corrections insert new rows with record_version + 1 and prior_version_id pointing to the predecessor; original rows are preserved. Consumers always filter to MAX(record_version) per natural key. UsageMeteringLog_current view exposes this as the default read path.
Quality checks
9 rules at ingestion. HARD failures (reject to DLQ): negative units_consumed, inverted period bounds, future-dated period_end, missing ingest_event_id, FK orphan, out-of-range seat_utilization_pct, non-monotonic record_version. SOFT warnings (accept + emit): late arrival >24h, effective_at drift, anomaly >10x trailing 30d median.
Reconciliation
Daily ±0.1%, monthly ±0.01% (audit-grade), quarterly ±0.001%. Monthly failure holds AGT-804 close. audit_status state machine: pending_reconverified on pass; transitions to disputed on customer challenge. Verified rows feed all consumers; disputed rows excluded from recognition.
Retention
Hot 0–13mo (operational), warm 13–36mo (reporting), cold 36mo–7yr (audit-only restored on demand), purge after 7yr with legal-hold check. ASC 606 + tax retention requirement. Tier transitions move bytes, never modify values; record_version chains preserved across tiers.
Write path
Source emits to ingestion queue (Kafka/Kinesis/SQS) → ingestion service deduplicates + validates + UPSERTs → threshold publisher emits to UsageThresholdEvents for AGT-503 real-time subscription → reconciliation jobs run daily/monthly/quarterly → nightly tier-transition job. DLQ for HARD failures with >1h-depth pager.
Go-live checklist
10 gating items including: source system instrumentation, ingestion service deployment, DLQ + monitoring, UsageThresholdEvents end-to-end test, AGT-804 reading real (non-synthetic) data for one full close, daily reconciliation passing 7 consecutive days, monthly reconciliation within tolerance 3 consecutive months, external auditor sign-off (SOC 2 / ASC 606), 7-year retention path tested with sample restore, runbook reviewed by oncall.
Impact
L8's data foundation is now formally specified. Brain layer (planned Tier 2) can be built against a known contract rather than against ambient assumptions. Engineering work to make the schema operational is non-trivial — the document is the production target, not a deployed table. Going forward, every L8 agent build references this schema for usage data; no agent queries the product system directly. No behavioral changes to existing agents — the production schema formalizes contracts already in use; field additions (source_system, record_version, etc.) are append-only against the v25 schema as documented in GTM_OS_Schema_v25.xlsx.
Owner
Joint: Finance + Data Engineering. Single-writer is the ingestion service. AGT-804 / AGT-802 / AGT-503 / AGT-501 are read-only consumers.
v23 → v24 AGT-107 — Quota Plan Document Agent 1 new agent specced — legally deployable plan docs from AGT-101/102 outputs. E-signature gate. QuotaPlanDocLog added.
AGT-107L1 Quota Plan Document Agent
New agentSchema
Document execution agent. Three output formats. E-signature gate before plan is active. Amendment recommendation logic. AGT-104 integration.
Change
Net new agent. Generates legally deployable quota plan documents from approved AGT-101/102 outputs. Execution-only — never sets quota or modifies plan terms. Three triggers: AGT-102 final approval (annual), new rep hire (onboarding), AGT-104 exception path (mid-year amendment). Three output formats per generation: URL-based HTML for rep view + e-signature, PDF generated at signature time as legal record, structured JSON payload for downstream system consumption.
Document contents
Quota targets (annual + quarterly + monthly where applicable), payout curve, kickers + accelerators, draw terms (section omitted if no draw), territory/named account list, plan governance summary. OTE split excluded (governed by offer letter/HR). SPIFs excluded (separate short-term documents).
E-signature gate
Plan not legally active until e-signature received. AGT-103 attainment computes but AGT-104 holds payout pending signature. Signature overdue (5 business days) → manager + HR notified, flagged in CompAuditLog. Integration: v1 hook (DocuSign/equivalent in live deployment; manual HR fallback in pre-integration).
Amendment logic
Agent recommends addendum (quota-only changes) vs full reissue (structural term changes: new kickers, territory reassignment, draw terms, payout curve change). RevOps makes final selection before generation begins — agent never auto-selects. Override rationale required if RevOps deviates from recommendation. All selections logged in QuotaPlanDocLog.
AGT-104 integration
AGT-104 plan approval gate now reads QuotaPlanDocLog.signature_received. Reps with pending_signature held individually — not the full cohort. Signature completeness audit added to AGT-104's 12 controls.
Schema adds
QuotaPlanDocLog — one row per document generation event. 21 fields: doc_id, rep_id, plan_id, quota_store_id, trigger_type, doc_type, doc_version, doc_status, html_url, structured_payload, pdf_url, generated_at, signature_received, signed_at, signature_overdue_notified, superseded_by_doc_id, amendment_recommendation, amendment_selection, amendment_selection_by, amendment_override_rationale, exception_log_id.
Trigger
Planned at L1 original build (v11–v16). Specced now as the last L1 agent. No earlier sessions had the full AGT-102/104 design locked to spec against.
Impact
Closes the legal deployment gap in L1 — quota and comp plan design were fully specced but the mechanism for producing a legally binding rep-facing document was absent. AGT-104 signature completeness audit is a new control — existing comp governance now covers the full lifecycle from quota approval through signed plan delivery. AGT-103 payout behavior unchanged — attainment computes regardless of signature status; AGT-104 controls the payout hold, not AGT-103.
v11 → v16 Layer 1 original build 6 agents added — Sales planning & compensation complete
AGT-101L1 Quota Setting Agent
New agentSchema
Hybrid top-down/bottoms-up quota model with 9 guardrails and 4-gate approval
Change
Net new agent. Hybrid top-down/bottoms-up quota methodology. eRep ramp model (0.25→0.50→0.75→1.0). Nine guardrails. Four-gate approval chain (SLM→RevOps→CRO/FP&A/HR→publish). Segment-differentiated quotas (AE: SMB $400K, MM $800K, Ent $1.6M).
Schema adds
QuotaStore (625 rows), RampSchedules, AccountQuotaTargets, QuotaAmendmentLog
Trigger
Layer 1 original build.
Impact
Foundational. All downstream productivity ratios (AGT-105, AGT-205, AGT-702) derive quota targets from QuotaStore.
AGT-102L1 Comp Plan Design Agent
New agentSchema
Comp plan generation with kickers, SPIFs, draw terms, and payout curves
Change
Net new agent. Translates AGT-101 quota outputs into legally deployable comp plans. OTE splits, gate/multiplier curves, quadratic usage curve, kickers, SPIFs, recoverable/non-recoverable draw. Soft approval gate.
Schema adds
CompPlans extended (cols 22–31), TIPolicy, TIBudgetModel, Kickers, SPIFs
Trigger
Layer 1 original build.
Impact
Foundational. TIBudgetModel feeds AGT-105 loaded HC cost model. TIPolicy feeds AGT-104 TI% variance control.
AGT-103L1 Comp Attainment & Payout Calculator
New agentSchema
Daily batch payout engine with QuotaRetirementLedger and DrawLedger
Change
Net new agent. Daily batch payout calculation. QuotaRetirementLedger as transaction layer (one row per deal/usage event). DrawLedger for recoverable draw tracking. AttainmentSnapshot deliberately excluded — intra-period attainment is a display problem.
Schema adds
QuotaRetirementLedger, PayoutHistory, DrawLedger
Trigger
Layer 1 original build.
Impact
Foundational. PayoutHistory feeds AGT-102 TI budget calibration (Option B), AGT-105 TI spend in loaded HC model, AGT-702 GTM Health Monitor S&M spend.
AGT-104L1 Comp Governance & Audit Agent
New agentSchema
12-control daily audit layer. One autonomous action: territory ROE payout holds.
Change
Net new agent. 12 controls: payout accuracy, draw integrity, kicker stacking, quota amendment audit, TI% variance, ROE violations, plan approval gate, SOX chain, duplicate detection, clawback triggers, usage gate audit, policy exception log. Clawback writes DrawLedger recovery rows.
Schema adds
CompAuditLog, ExceptionLog. DrawLedger extended: new draw_type = clawback_recovery.
Trigger
Layer 1 original build.
Impact
Audit layer. No downstream data dependencies created.
AGT-105L1 Sales Capacity Planning Agent
New agentSchema
Monthly rolling 13–24 month eRep projection. Three guardrails reconciled to one hiring recommendation.
Change
Net new agent. Three productivity layers (eRep, fully loaded GTM HC, GTM + Marketing HC). Magic Number, R40, and CAC Payback guardrails run independently and reconcile to a single hiring recommendation. FP&APlan as top-down constraint source.
Schema adds
FP&APlan, CapacityPlan, HiringPlan, AttritionLog, MarketingHC
Trigger
Layer 1 original build.
Impact
FP&APlan is foundational for Layers 4 and 7. AGT-402 and AGT-404 reference revenue targets. AGT-702 references metric targets. AGT-205 TAM/SAM references plan for feasibility check.
AGT-106L1 Territory Design Agent
New agentSchema
Named account lists + rules-based routing. routing_eligible behavioral capacity gate.
Change
Net new agent. Role-differentiated carve: AM/CSM/Enterprise AE on named account lists; SMB/MM AE on rules-based routing (segment × geo × vertical). routing_eligible flag uses behavioral capacity signals (strict AND logic).
Schema adds
TerritoryDefinitions, TerritoryAssignmentLog
Trigger
Layer 1 original build.
Impact
TerritoryDefinitions is the routing authority for AGT-202 Lead Router v2. routing_eligible gate is used by AGT-202 and feeds AGT-206 dormancy flags back to AGT-106.
v17 Layer 2 build 6 agents added — ICP & lead management complete. 10 new tables. Accounts + Leads extended.
AGT-201L2 ICP Scorer
New agentUpdate from original
Segment bands redefined. Vertical taxonomy aligned to TerritoryDefinitions 8-vertical standard.
Change
Segment bands updated: SMB 5–200, Mid-Market 201–2,000, Enterprise 2,001+. Dimension 2 vertical taxonomy aligned to TerritoryDefinitions (FinTech, HealthTech, SaaS, RetailTech, HR Tech, EdTech, Cybersecurity, Logistics). Segment removed as a scoring dimension — confirmed as routing input only.
Why
Original bands predated segment redefinition decision. Vertical alignment needed so ICP scoring and territory routing reference the same 8-vertical taxonomy.
Impact
Scoring recalibration expected on existing accounts. Accounts near tier boundaries may shift T1↔T2. AGT-202 routing tier SLAs unaffected (T1=2hr, T2=24hr remain).
AGT-202L2 Lead Router v2.0
Re-speccedRipple from AGT-106
TerritoryDefinitions replaces RepBook as routing authority. PLG gate added. Vertical preference tier added.
Change
  • Routing authority source changed from RepBook to TerritoryDefinitions
  • routing_eligible gate added — checked before every assignment
  • Motion gate (R2) added — PLG/inbound bypasses SDR entirely, routes to AE direct
  • Vertical preference tier (P2) added to rules-based cascade
  • Workload imbalance threshold specified: >1.5 SD above segment mean (rolling 30 days)
  • Named account rep ineligible → named_account_hold flag + segment manager. Rep reclaims on return.
Why
AGT-106 Territory Design (v14) introduced TerritoryDefinitions as the authoritative account ownership source. AGT-202 v1 was built before this existed and used RepBook instead. SDR motion decision: SDRs are purely outbound — PLG accounts should never hit a cold outreach sequence.
Schema adds
Leads extended: gtm_motion_at_routing, routing_rule_applied, named_account_hold, vertical_match, workload_flag, first_contact_timestamp, tier_change_post_contact
Impact
Routing behavior change. PLG leads previously hitting SDR sequences now route directly to AE. Accounts with named rep ineligible now hold rather than fall back to segment manager immediately.
AGT-203L2 ABM Target Selection Agent
New agentSchema
T1-only ABM program. 1:1 bespoke and 1:few cluster tiers. Quarterly TAL + event-triggered.
Change
Net new agent. Target account selection only (not outreach execution). T1 accounts only. Enterprise: 1:1 and 1:few eligible. MM: 1:few only. SMB: SLM nomination only. Agent proposes → rep confirms 1:1 → FLM confirms 1:few → SLM approves.
Schema adds
TargetAccountList, ABMPlaybook, ABMEngagementLog. Accounts extended: abm_active, abm_tier
Impact
ABM status creates downstream behavior changes in 3 agents: AGT-202 named account lock (no fallback routing), AGT-301 deeper research pass, AGT-304 nurture suppression.
AGT-204L2 Lead Enrichment Agent
New agentSchema
Signal-differentiated re-enrichment. 6 data categories. Confidence-weighted overwrite. Triggers ICP rescore.
Change
Net new agent. Pre-score enrichment (30s timeout gate) + signal-differentiated periodic re-enrichment. Intent/news: daily (T1). Contact: monthly. Firmographic/technographic: quarterly. Confidence-weighted overwrite. ICP auto-recalculates on scored field update. 30s timeout: AGT-201 scores with available data, enrichment completes async.
Schema adds
EnrichmentLog, EnrichmentQueue, Contacts (net new cross-cutting table). Accounts extended: 15 enrichment fields including intent_score, funding_amount_usd, crm_vendor, enrichment date fields.
Impact
Unblocks AGT-201 Dimensions 4+5 (25 pts previously data_gap). crm_vendor and technographic fields feed AGT-403 competitor pre-population at deal open (see v17→v18 AGT-403 ripple). Contacts table feeds AGT-305, AGT-401, AGT-405 stakeholder mapping.
AGT-205L2 TAM/SAM Sizing Agent
New agentSchema
288-cell model. Hybrid top-down TAM / bottom-up SAM. FP&APlan feasibility check.
Change
Net new agent. TAM/SAM/SOM × segment × vertical × geo × product family (288 cells). Hybrid methodology. FP&APlan feasibility check on every run. Whitespace flags to AGT-203. SOM feeds AGT-105 "what would have to be true" scenarios.
Schema adds
MarketAssumptions (RevOps-owned, versioned), MarketSizeModel
Impact
Feasibility check creates a live validation against FP&APlan targets. If FP&APlan Y2 target exceeds SOM, AGT-205 flags the gap — first agent to challenge the financial plan from a market sizing perspective.
AGT-206L2 Account Prioritization Agent
New agentSchema
Event-triggered within-book ranking. Intent decay formula. Semi-annual full-universe scoring feeds AGT-106.
Change
Net new agent. 3-dimension model: fit (40%), intent (40%), opportunity (20%). Intent decay: score × e^(-hours/72). Event-triggered re-rank (intent spike, stage change, enrichment). Semi-annual full-universe scoring feeds AGT-106 territory rebalancing. Pin-only rep override with 30-day expiry.
Schema adds
AccountPriorityScore, PriorityWeightsConfig
Impact
Priority score feeds AGT-301 queue order (highest priority sequences drafted first). Semi-annual scores feed AGT-106 territory rebalancing. Dormancy flags identify accounts for removal from rep books.
v17 → v18 Layer 3 updates 2 agent updates (L2 ripples), 1 new agent specced, 1 removed from OS
AGT-301L3 Outreach Generator
Ripple from L2Prompt delta
ABM accounts trigger deeper research pass. AGT-206 priority score sets queue order.
Change 1
When Accounts.abm_active = TRUE: agent runs a deeper enrichment pull (all 6 enrichment categories + recent news/triggers) before drafting each step. ABM sequence template and step cadence unchanged — personalization depth only.
Change 2
AccountPriorityScore.composite_score determines draft order when multiple sequences are queued. Highest score drafted first. No effect on step timing, cadence structure, or sequence content.
Why
AGT-203 ABM Target Selection (Layer 2) added abm_active and abm_tier to Accounts. AGT-206 Account Prioritization (Layer 2) added composite_score as a rep-facing priority signal that should influence agent work queue.
Schema change
None. No new tables or fields. Prompt-only delta.
Impact
Sequence quality improvement for ABM accounts. Queue ordering change is non-breaking — sequence content and timing unchanged.
AGT-304L3 Marketing Nurture Agent
Ripple from L2Prompt delta
ABM suppression gate added. Accounts with abm_active=TRUE excluded from generic nurture.
Change
First check before any nurture enrollment: IF Accounts.abm_active = TRUE → exclude from nurture. Exclusion logged in NurtureRecommendations with reason = 'ABM active suppression'. Account re-enters nurture eligibility when abm_active resets to FALSE on TAL exit.
Why
AGT-203 ABM Target Selection creates bespoke outreach plays for T1 accounts. Generic nurture cadences running in parallel would create channel conflict and undermine the ABM play quality.
Schema change
None. Prompt-only delta. NurtureRecommendations already has a reason field.
Impact
Behavior change. Any account currently in nurture that gets added to TAL will be suppressed from nurture on next enrollment check. Existing nurture sequences in-flight are not retroactively stopped — they complete and re-enrollment is blocked.
AGT-305L3 Meeting Discovery Prep Agent
New agentSchema
Pre-first-meeting brief. 7 sections. Draft on booking + 4am refresh. AGT-407 scores post-call.
Change
Net new agent. Triggered by meeting_booked from outreach cadence (no deal exists). 7-section brief: account overview, pain hypothesis, discovery questions, stakeholder context, talk track, competitive context, ABM playbook alignment. Draft on booking + 4am rep local time refresh on meeting day. AGT-407 scores adherence + hypothesis accuracy post-call. Option 4 feedback loop activates when ≥30 scored calls per vertical.
Schema adds
DiscoveryBriefLog (all 7 sections + rep edits + nullable post-call score fields)
Impact
Creates DiscoveryBriefLog as a new data source for AGT-407 post-call scoring. Option 4 feedback loop will improve pain hypothesis quality per vertical over time — closes a learning loop that currently doesn't exist in the OS.
AGT-306L3 Rep Daily Briefing Agent
Removed from OS
Removed. Reps build their own personal briefing agents using the Rep Agent Builder Guide.
Change
AGT-306 Rep Daily Briefing Agent removed from GTM OS agent registry. Daily briefing is a personal workflow preference, not a system-level function.
Why
Reps have different briefing preferences. A system-mandated daily brief creates maintenance overhead and doesn't account for role variation (AE briefing vs CSM briefing vs SDR briefing are materially different). Better served by rep-built personal agents.
Enablement artifact
Rep Agent Builder Guide published — documents which GTM OS tables and agents reps can reference when building personal agents. Includes 4 example prompts (AE, CSM/AM, single-account deep dive, intent spike alert).
Impact
No downstream impact. No schema changes. Agent numbering gap at AGT-306 preserved intentionally as a record of the decision.
v21 → v22 Layer 3 open wire items 3 L2→L3 ripples wired — CadenceEventLog extended with 3 fields. AGT-301 + AGT-302 prompt deltas.
AGT-301L3 Outreach Generator
Ripple from L2SchemaPrompt delta
sequence_type and priority_score_at_queue written to CadenceEventLog at draft/queue time. Closes two open L2 wires.
Wire 1
AGT-301 now writes CadenceEventLog.sequence_type at sequence draft time. Derived from Accounts.abm_active + abm_tier: abm_bespoke (1:1 ABM), abm_cluster (1:few ABM), nurture (marketing nurture), standard (all other). Classification label only — no behavior change to sequence content, timing, or channel mix.
Wire 2
AGT-301 now writes CadenceEventLog.priority_score_at_queue at queue registration time. Snapshotted from AccountPriorityScore.composite_score at the moment of queue — does not update as score changes during the sequence. NULL if no AccountPriorityScore record exists for the account. Enables AGT-303 to analyze whether high-priority accounts received timely sequence execution relative to their score.
Why
ABM deeper research pass and priority queue order were wired in v18 (behavioral), but neither was traceable in CadenceEventLog. Without these fields, AGT-303 cannot distinguish ABM sequences from standard sequences or correlate priority score to sequence timing in analysis. Schema fields close the traceability gap.
Schema adds
CadenceEventLog extended: sequence_type (enum), priority_score_at_queue (float, nullable)
Impact
AGT-303 can now segment cadence performance by sequence_type — ABM vs standard sequence efficacy becomes measurable for the first time. Priority score correlation to sequence timing enables AGT-303 to flag when high-priority accounts are queued but executed late relative to their score.
AGT-302L3 Cadence Coordinator
Ripple from L2SchemaPrompt delta
ABM backstop gate added as Gate 2 in coordination sequence. suppressed_reason field written on all gate blocks. Closes third open L2 wire.
Wire 3
AGT-302 adds an explicit ABM nurture backstop as Gate 2 in its coordination sequence (after compliance check, before touch cap). If sequence_type = 'nurture' AND Accounts.abm_active = TRUE: block activation, write suppressed_reason = 'abm_active_suppression' to CadenceEventLog, notify originating agent. AGT-302 also writes suppressed_reason for all other gate blocks: touch_cap_exceeded, compliance_block, conflict_resolved_queued.
Why a backstop
AGT-304 is the primary ABM suppression gate for nurture enrollment. But AGT-302 previously had no visibility into whether future routing paths might bypass AGT-304 (e.g. a marketing automation trigger). The backstop ensures the OS-level constraint holds regardless of origination path. The suppressed_reason field provides AGT-303 with a unified suppression audit trail across all gate types and sequence origins.
Schema adds
CadenceEventLog extended: suppressed_reason (string, nullable). Written exclusively by AGT-302. AGT-304 continues to log suppression in NurtureRecommendations — AGT-302 writes to CadenceEventLog so suppression is visible in the unified cadence audit trail.
Impact
No behavior change for accounts not flagged abm_active — existing conflict resolution and touch cap logic unchanged. AGT-303 gains a suppression audit trail — can now report on how often ABM suppression fires, touch cap holds, and conflict resolution blocks occur across the cadence universe.
v22 → v23 Layer 4 ripple audit 5 agents updated — FP&APlan versioning, expansion ACV, MAP late-stage read, new KB feeds, call_owner_role. 1 schema add.
AGT-402L4 Forecast Adjuster
Ripple from v21 + L5Prompt delta
FP&APlan version join specced. Expansion ACV from AGT-503 ExpansionLog added as third forecast component.
Change 1
FP&APlan versioning (v21) introduced plan_version, reforecast_as_of_date, and is_board_plan. AGT-402's pct_of_target join is now explicitly specced: reads FP&APlan WHERE is_board_plan = FALSE ORDER BY reforecast_as_of_date DESC LIMIT 1 — the current operating plan, not the board plan. Board plan displayed as secondary reference only.
Change 2
Expansion ACV from ExpansionLog.acv_estimate (AGT-503) added as a third forecast component alongside new logo and renewal pipeline. Weighted by churn risk tier: Low = 80%, Medium = 50%, High/Critical = 0%. Forecast output adds an expansion revenue line to the breakdown. pct_of_target denominator uses the full FP&APlan revenue target including planned expansion.
Schema change
None. Both FP&APlan versioning fields and ExpansionLog already exist. Prompt-only delta.
Impact
Expansion revenue is now visible in the bottoms-up forecast — NRR upside is quantified in the forward revenue view for the first time. FP&APlan version join prevents the operating plan from being compared against the original board plan mid-year when reforecast has occurred.
AGT-403L4 Competitive Intelligence Agent
Ripple from L6 + L7Prompt delta
VoCSynthesisLog (AGT-604) and CalibrationSignalLog (AGT-703) added as KB feed sources. KB now has 4 signal sources.
Change
Two new CompetitiveKnowledgeBase feed sources: (1) VoCSynthesisLog (AGT-604) — monthly synthesis surfaces competitor themes from CSM calls and QBR outcomes. Post-sale competitive signal tagged signal_source = 'voc'. Read on each monthly AGT-604 synthesis run. (2) CalibrationSignalLog (AGT-703) — quarterly win-loss competitive dimension provides outcome-verified win/loss rates by competitor × segment × vertical. Tagged signal_source = 'win_loss'. win_loss entries take precedence over call_mention entries when signals conflict on a competitor's strength/weakness.
Why
Original KB only had pre-deal and mid-deal signals (technographic enrichment + live call mentions). Post-sale and outcome-verified signals are higher quality for displacement angle generation — a competitor that wins deals against us is more important to track than one that's merely mentioned.
Schema change
None. VoCSynthesisLog and CalibrationSignalLog already exist. Prompt-only delta adding two read sources to KB maintenance logic.
Impact
CompetitiveKnowledgeBase becomes a full-lifecycle competitive signal repository — pre-deal (technographic), mid-deal (calls), post-sale (VoC), and outcome-verified (win-loss). Displacement angles generated by AGT-403 for AGT-301/305/405 are now informed by actual win/loss outcomes, not just mentions.
AGT-404L4 Top-Down Forecast Agent
Ripple from v21Prompt delta
FP&APlan version join specced. plan_version label added to all comparison output lines. Board plan as secondary reference.
Change
Same FP&APlan versioning fix as AGT-402. AGT-404's feasibility comparison now joins on the current operating plan version (non-board, most recent reforecast). Board plan displayed as a secondary reference line. Every comparison output line now carries a plan_version label (e.g. "3+9") so the reader knows which plan iteration the % refers to — critical when operating plan has been revised mid-year.
Schema change
None. Prompt-only delta.
Impact
Additive. Output gains plan_version label and a secondary board plan reference line. Prevents misleading pct_of_plan figures when operating plan has been reforecast downward mid-year.
AGT-405L4 Meeting Opportunity Prep Agent
Ripple from L6Prompt delta
MutualActionPlan read added to Section 6 (deal terms) for Proposal/Negotiation/Close stage briefs.
Change
MutualActionPlan (MAP) was introduced in v20 with AGT-601. AGT-405's deal terms section (Proposal/Negotiation/Close only) now reads MutualActionPlan where opp_id matches. Section 6 gains: MAP status (draft/customer_review/signed/not_applicable), open MAP action items and owners, outstanding rep commitments from MAP. If MAP not started for an eligible deal: brief flags "MAP not started — required before close." Nullable — omitted without error if no MAP record exists.
Why
MAP is created pre-close for Enterprise/high-ACV deals. A late-stage meeting brief that doesn't reference MAP commitments leaves the rep without visibility into what the selling team has formally committed to. Open MAP items are often the decisive close blockers.
Schema change
None. MutualActionPlan already exists. Prompt-only delta on Section 6 of AGT-405 brief generation logic.
Impact
Late-stage meeting briefs now surface MAP status and open commitments — rep enters close meetings with full visibility into what's been promised and what's still open. No impact on early-stage briefs — MAP is only read when deal is at Proposal or later.
AGT-407L4 Conversation Intelligence Agent
Ripple from L5SchemaPrompt delta
call_owner_role field added to ConvIntelligence. Enables AGT-501 to filter CSM calls for customer health ConvIntelligence adjustment.
Change
AGT-501 Customer Health Monitor uses a ConvIntelligence adjustment from CSM-owned calls specifically — not all sales conversations. Without a call_owner_role field, AGT-501 cannot distinguish CSM calls from AE/SDR/AM/SE calls. AGT-407 now writes call_owner_role (enum: AE / SDR / AM / CSM / SE) on every ConvIntelligence record, derived from rep role in RepActivity or TerritoryDefinitions. Existing rows carry NULL until agent is re-run.
Why
AGT-501's ConvIntelligence adjustment was specced in v19 but the filtering mechanism was never created. CSM call sentiment feeding into a customer health score is materially different from AE call sentiment on a deal — conflating them would corrupt the health signal. call_owner_role is the filter key that makes this distinction possible.
Schema adds
ConvIntelligence extended: call_owner_role (enum: AE / SDR / AM / CSM / SE, nullable on existing rows)
Downstream consumers
  • AGT-501: filters call_owner_role = 'CSM' for customer health adjustment
  • AGT-701: filters by role for role-parameterized coaching inputs
  • AGT-703: can segment conversation patterns by role in win-loss analysis
Impact
Unblocks AGT-501 ConvIntelligence adjustment — CSM call signals can now be correctly filtered and applied to customer health scoring. Without this field, AGT-501 had to skip the ConvIntelligence dimension or apply it incorrectly using all rep types.
v17 → v18 Layer 4 updates 4 agent ripple updates + 3 new agents specced
AGT-401L4 Deal Health Monitor
Ripple from L2Prompt delta
Contacts table improves multi-threading signal. ABM deals raise Green threshold to 80.
Change 1
stakeholder_engagement_breadth dimension now reads from Contacts table (which persona types are engaged: economic buyer, champion, technical evaluator) rather than raw RepActivity contact count. More precise signal — persona coverage matters more than raw contact count.
Change 2
When Accounts.abm_active = TRUE: minimum score for Green threshold raises from 75 → 80. Economic buyer engagement becomes a required dimension — score 0 on this dimension if economic buyer is not engaged, regardless of other scores.
Why
AGT-204 Lead Enrichment created the Contacts table (v17) with persona-typed records. AGT-203 ABM status signals a higher-scrutiny deal that warrants stricter health standards.
Schema change
None. Reads from existing Contacts table. Prompt-only delta.
Impact
ABM deals will score harder to reach Green. Expected and intentional — ABM accounts warrant higher scrutiny. Non-ABM deals unaffected.
AGT-402L4 Forecast Adjuster
Ripple from L1Prompt delta
FP&APlan revenue targets added. Forecast output gains pct_of_target field by segment.
Change
Forecast output adds pct_of_target field: AI forecast ÷ FP&APlan.revenue_target_usd by segment. Shows where each segment's pipeline trajectory sits relative to annual plan — not just absolute numbers.
Why
FP&APlan table (introduced in v16 with AGT-105) contains annual revenue targets by segment. The bottoms-up forecast gains meaning when expressed as % of target rather than just a dollar figure in isolation.
Schema change
None. FP&APlan already exists. Prompt-only delta adding one derived field to forecast output.
Impact
Additive. Existing forecast fields unchanged. New field provides coverage ratio context for CRO and Finance consumers.
AGT-403L4 Competitive Intelligence Agent
Ripple from L2Prompt delta
Technographic enrichment pre-populates competitor_detected at deal open — no longer waits for rep flag.
Change
When an opportunity opens, AGT-403 checks Accounts.crm_vendor and technographic enrichment fields against CompetitiveKnowledgeBase. If a competitor is detected: Opportunities.competitor_detected = TRUE pre-set at deal creation. Previously this only fired when a rep manually flagged a competitor mid-deal.
Why
AGT-204 Lead Enrichment (v17) added crm_vendor and technographic stack fields to Accounts. This data makes it possible to know the competitive landscape before the deal starts — rep shouldn't have to discover this themselves in a discovery call.
Schema change
None. Reads from existing Accounts.crm_vendor. Prompt-only delta.
Impact
Competitive context available from deal open, not mid-deal. Reps enter discovery knowing the competitive landscape. CompetitiveKnowledgeBase can be pre-loaded with displacement angles before the first call.
AGT-404L4 Top-Down Forecast Agent
Ripple from L1Prompt delta
FP&APlan feasibility comparison added to top-down output — pipeline trajectory vs annual plan target.
Change
Top-down forecast output adds a FP&APlan comparison line: "Current pipeline trajectory vs annual plan target" by segment. Shows whether the top-down model is on track to hit plan, and by how much it is over or under.
Why
FP&APlan table (v16) contains annual and 3-year revenue targets. The top-down forecast produces a revenue projection — expressing it as % of plan target closes the loop between the financial model and the sales forecast.
Schema change
None. FP&APlan already exists. Prompt-only delta.
Impact
Additive. Existing forecast fields unchanged. Provides board-ready on-track/off-track narrative alongside the forecast number.
AGT-405L4 Meeting Opportunity Prep Agent
New agentSchema
Deal-active meeting briefs. 4 triggers. Stage-differentiated for Proposal/Negotiation/Close. AGT-407 scored.
Change
Net new agent. 4 triggers: stage advance, meeting scheduled, on-demand, deal health drop. 7 sections including deal terms (Proposal/Negotiation/Close only). Standard format for early stages, differentiated for late stages. Draft on trigger + 4am refresh on meeting day. Rep + internal attendees + manager (Amber/Red) receive brief. AGT-407 scores objective completion and deal advance quality.
Schema adds
OpportunityBriefLog (deal-active counterpart to DiscoveryBriefLog)
Key distinction from AGT-305
AGT-305 scores hypothesis accuracy (did pain prediction match reality?). AGT-405 scores deal advance quality (did deal move stage, did rep execute objectives?). Different feedback loops on the same scoring infrastructure.
Impact
Health drop trigger creates automatic recovery prep — manager notified immediately. AGT-407 deal advance scoring creates a new coaching signal for AGT-701: which reps execute stage objectives vs which advance deals without meeting stated objectives.
AGT-407L4 Conversation Intelligence Agent
New agentSchema
Analysis layer on top of recording platform transcripts. 5 dimensions. Feeds 5 downstream agents.
Change
Net new agent. Analysis layer — transcription handled by Gong/Zoom/Chorus via webhook (hook in v1). 5 dimensions: sentiment, next step extraction, competitor detection, objection pattern extraction, email analysis. All conversations in scope regardless of deal stage. Per-conversation processing on transcript receipt.
Schema adds
ConvIntelligence extended with AGT-407 analysis fields: sentiment scores, next_steps (JSON), competitors_mentioned, objections_raised, conv_intelligence_score (0–3 for AGT-401), post_call_summary.
Downstream unblocked
  • AGT-401: conv_intelligence dimension was data_gap (0pts). Now scoreable — full 9-dimension 100pt deal health score achievable.
  • AGT-305: post-call hypothesis accuracy scoring now active.
  • AGT-403: competitor mentions from calls feed CompetitiveKnowledgeBase.
  • AGT-701: objection handling quality becomes a coaching input.
Impact
Highest-impact new agent in Layer 4. Unblocks the full deal health score and activates AGT-305's feedback loop. Every downstream agent that reads ConvIntelligence gains richer signal.
Architecture Decisions that affect the full OS Layer reorder, agent naming convention, Revenue Ops boundary
Layer order L5/L6/L7 resequenced
Architecture decision
Original L5 (measurement) moved to end. Post-sale retention now L5, lifecycle L6, measurement L7.
Change
Layer reorder: L0→L1→L2→L3→L4→L5 (post-sale retention, was L6)→L6 (customer lifecycle, was L7)→L7 (measurement & feedback, was L5). Agent numbering updated accordingly: AGT-501–504, AGT-601–603, AGT-701–703.
Why
Measurement is a feedback layer that reads from all other layers and closes the loop back to L1. Placing it before post-sale layers was architecturally awkward — it implied measurement happens before customer management. Post-sale layers are sequential; measurement is horizontal.
Impact
Agent renumbering only. No behavior changes. AGT-011/012 (original) → AGT-701/702 (new). AGT-013/014/015 → AGT-501/502/503.
Agent registry Canonical numbering convention
Architecture decision
All agents get unique number + human-readable name. Layer-prefix numbering (AGT-101=L1 agent 01).
Change
Canonical agent registry established. Every agent has: unique number (AGT-XXX), human-readable name, purpose, layer, status. Layer-prefix convention: 1xx=L1, 2xx=L2, 3xx=L3, 4xx=L4, 5xx=L5, 6xx=L6, 7xx=L7, 8xx=L8. Text-only IDs (AGT-ABM, AGT-ENR, etc.) replaced with numbered equivalents.
Why
Mixed naming convention (some numbered, some text) made cross-references ambiguous and prevented self-documenting dependency tracking. Numbered IDs with human names provide both machine-readable precision and human-readable context.
Impact
Reference convention change only. All specs updated to use new IDs going forward.
L8 boundary Revenue Operations layer defined
Architecture decision
L8 Revenue Ops is same framework as GTM OS but Finance-owned. CPQ stays in L4. Invoicing/billing in L8.
Change
L8 Revenue Operations defined as a distinct module within the OS framework (same agent/schema architecture) but Finance-owned. GTM OS stops at signed quote. L8 covers order → invoice → payment → revenue recognition. CPQ (AGT-406) lives in L4 — it's used during active deals. L8 starts at AGT-801 Order Management, triggered by signed quote output from AGT-406.
Why
Invoicing and billing interact with billing infrastructure (Stripe, Zuora, NetSuite) rather than CRM/GTM tooling. Finance owns billing; GTM owns L1–L7. Clear ownership boundary reduces governance ambiguity while maintaining schema coherence for integration points (usage data→AGT-503, revenue recognition→AGT-702).
Integration points
  • AGT-406 CPQ output (signed quote) → AGT-801 Order Management input
  • AGT-804 usage data → AGT-503 Expansion Trigger (read-only feed)
  • AGT-804 revenue recognition → AGT-702 GTM Health Monitor (Magic Number calculation)
  • AGT-803 payment status → AGT-501 Customer Health (billing health signal)
Impact
Architectural boundary defined. L8 agents not yet specced. Integration contracts with L4 and L5 are pre-defined.
v17 → v18 Layer 4 new agents 3 new agents specced — Meeting Opportunity Prep, CPQ & Deal Desk, Conversation Intelligence
AGT-405L4 Meeting Opportunity Prep Agent
New agentSchema
Deal-active meeting briefs. 4 triggers. Stage-differentiated for Proposal/Negotiation/Close. AGT-407 scored.
Change
Net new agent. Deal-active counterpart to AGT-305. 4 triggers: stage advance, meeting scheduled on opportunity, rep on-demand, deal health drop. 7 sections with stage-differentiated format for Proposal/Negotiation/Close (deal terms section added). Draft on trigger + 4am refresh on meeting day. Recipients: rep + internal calendar attendees + manager when Amber/Red. AGT-407 scores objective completion and deal advance quality post-meeting.
Schema adds
OpportunityBriefLog — deal-active counterpart to DiscoveryBriefLog. Includes nullable post-meeting scoring fields: objective_completion_rate, deal_advanced, deal_advance_quality_score, commitment_follow_through_score.
Key distinction from AGT-305
AGT-305 scores hypothesis accuracy (did pain prediction match reality?). AGT-405 scores deal advance quality (did deal move stage, did rep execute stated objectives?). Different feedback loops on the same scoring infrastructure.
Trigger
Layer 4 spec session — net new agent.
Impact
Health drop trigger creates automatic recovery prep — manager notified immediately regardless of normal band threshold. AGT-407 deal advance scoring creates a new coaching signal for AGT-701: which reps execute stage objectives vs which advance deals without meeting stated objectives.
AGT-406L4 CPQ & Deal Desk Agent
New agentSchema
URL-based HTML quote. 4 pricing models. 4-tier approval chain. Per-viewer interaction tracking. L8 handoff on acceptance.
Change
Net new agent. Supports 4 pricing models: seat license, consumption commitment, pay-as-you-go, professional services. Seat + consumption can coexist as separate line items, not bundled. Agent proposes configuration, rep explicitly confirms before quote is generated. Quote is URL-based HTML — not a PDF. URL in pending state until all approvals clear. 4-tier approval chain (rep → manager → deal desk → CRO/VP) auto-routed based on deal configuration vs PricingConfig thresholds. Per-viewer trackable links for email delivery. Forwarding signal detection (new external IP = possible procurement loop-in).
Quote vs Order Form
Quote is a pricing proposal, not a legal document. Acceptance triggers AGT-801 (L8) which generates the formal order form from quote data. GTM OS boundary is at accepted quote. Billing contact on the OF (built by L8) is the target for AGT-504 Customer Communications on future price change events.
Schema adds
PricingConfig (L0 — Finance + RevOps owned, rate cards + discount thresholds), QuoteLog (state machine + line items), QuoteApprovalLog (per-tier approval audit trail), QuoteViewLog (per-viewer interaction tracking)
Trigger
Layer 4 spec session — net new agent.
Impact
QuoteViewLog forwarding signal feeds AGT-401 deal health — buyer engagement and procurement loop-in are positive deal signals. PricingConfig is a new L0 policy table read by AGT-405 (deal terms section) and AGT-504 (price change comms) in addition to AGT-406.
AGT-407L4 Conversation Intelligence Agent
New agentSchema
Analysis layer on recording platform transcripts. 5 dimensions. Unblocks AGT-401 conv_intelligence score. Feeds 5 downstream agents.
Change
Net new agent. Analysis layer only — transcription handled upstream by recording platform (Gong/Zoom/Chorus) via webhook (hook in v1). 5 analysis dimensions: sentiment scoring, next step extraction with commitment quality, competitor mention detection, objection pattern categorization, email thread analysis. Processes all sales conversations regardless of deal stage. Per-conversation processing on transcript receipt.
Schema adds
ConvIntelligence extended with AGT-407 analysis fields: overall_sentiment, sentiment_score, next_steps (JSON), next_step_committed, competitors_mentioned, objections_raised, unaddressed_showstopper, conv_intelligence_score (0–3, feeds AGT-401 deal health dimension), post_call_summary. Nullable fields — existing 100 rows carry NULL until agent is live.
Downstream unblocked
  • AGT-401: conv_intelligence dimension was data_gap (0 pts). AGT-407 makes full 9-dimension 100pt deal health score achievable for the first time.
  • AGT-305: post-call hypothesis accuracy scoring now active once brief_id is linked to conversation.
  • AGT-405: post-meeting objective completion and deal advance quality scoring now active.
  • AGT-403: competitor mentions from calls feed CompetitiveKnowledgeBase in real-time.
  • AGT-701: objection handling quality and next step commitment rate become coaching inputs.
Trigger
Layer 4 spec session — net new agent. Identified as dependency for AGT-401, AGT-305, and AGT-405.
Impact
Highest-impact new agent in Layer 4. Unblocks the full deal health score. Activates the AGT-305 hypothesis accuracy feedback loop. Every downstream agent reading ConvIntelligence gains richer signal as call volume accumulates.
v18 → v19 Layer 5 — Post-Sale Retention 4 agents specced — Customer health, churn risk, expansion trigger, customer communications. 6 new tables. Role convention corrected: AM as relationship owner.
AGT-501L5 Customer Health Monitor
New agentSchemaRipple from L2 + L8
Daily health scoring engine. 7 dimensions + ConvIntelligence adj + payment health modifier. Canonical source for AGT-502 and AGT-503.
Change
Net new agent. Daily batch scoring for all active customer accounts. 7-dimension model (100 pts base): product usage trend (22), exec sponsor engagement (20), seat utilization (18), NPS/CSAT (14), engagement recency (12), support ticket trend (8), competitive threat (6). ConvIntelligence adjustment from CSM-owned calls adds up to 12 pts (sentiment, talk ratio, next-step quality — trailing 30-day pattern). Payment health modifier (cap/floor) applied last. Writes CustomerHealthLog as canonical daily record — AGT-502 and AGT-503 read exclusively from this table.
Payment modifier
Current: no effect. Overdue: score capped at 77 (cannot reach Low tier). Failed: score capped at 62 (cannot reach Medium). Suspended: score floored at Critical regardless of all other dimensions. Sourced from AGT-803 (L8) read-only feed. Non-blocking — if AGT-803 not live, modifier skipped.
L2 ripple
Exec sponsor dimension reads from Contacts table persona-typed records (economic_buyer, champion) rather than raw RepActivity contact count. Same pattern as AGT-401 v18 update.
L8 integration
AGT-803 payment_health_status → CustomerHealthLog.payment_health_status. Read-only feed. Non-blocking dependency.
Schema adds
CustomerHealthLog — one row per account per day. Fields: raw_health_score, health_score (post-modifier), score_delta, payment_health_status, payment_modifier_applied, per-dimension scores (7 fields), conv_intelligence_adj, active_flags (JSON), data_gap_dimensions (JSON), sponsor_departure_flag.
Impact
Foundational for L5. CustomerHealthLog is the single source of truth consumed by AGT-502 (churn risk) and AGT-503 (expansion). Sponsor departure flag passes to AGT-502 immediately on same run — does not wait for next AGT-502 cycle.
AGT-502L5 Churn Risk Detector
New agentSchemaRole correction
Renewal proximity multiplier on AGT-501 scores. Four risk tiers. AM as primary alert recipient (corrects original spec). Tiered escalation + downstream triggers.
Change
Net new agent. Runs daily after AGT-501 batch. Reads CustomerHealthLog health_score and applies renewal proximity multiplier: ≤30 days ×0.85, 31–60 ×0.92, 61–90 ×0.96, >90 ×1.0. Effective risk score drives tier assignment: Critical (0–44), High (45–62), Medium (63–77), Low (78–100). Immediate alerts on: score drop ≥15 pts, tier transition, sponsor_departure_flag (fires same run as AGT-501 detection). Weekly digest for all other accounts — one per AM, one per CSM, segment-level to SLM, org-wide NRR risk to CRO (feeds AGT-702).
Role correction
AM is relationship and revenue owner — primary alert recipient on all churn risk events. CSM is adoption influencer — notified concurrently as context, not action owner. Finance notified first on payment-status alerts (billing event before relationship event), then AM + CSM simultaneously. This corrects the original L6 spec which positioned CSM as primary.
Downstream triggers
  • Competitive threat + no plan → AGT-403 competitive brief
  • Low tier (score ≥78) → pass account_id to AGT-503 as expansion candidate
  • Renewal ≤90 days + no open renewal opp → AM flagged to open renewal opp (not autonomous)
  • Critical + high ACV → manager notified concurrently
Schema adds
ChurnRiskLog — one row per account per run. Fields: effective_risk_score, churn_risk_tier, renewal_multiplier, renewal_proximity_days, alert_type, recommended_action, recovery_plan_logged, downstream_triggers (JSON), expansion_candidate.
Impact
Weekly org-wide NRR risk summary feeds AGT-702 GTM Health Monitor. Role correction changes alert routing for all existing customers — AM inbox receives what previously went to CSM as primary.
AGT-503L5 Expansion Trigger Agent
New agentSchemaRipple from L2 + L8
5-signal expansion scoring. Churn risk gate + open opp check before any play. TerritoryDefinitions replaces RepBook for AM routing. AGT-804 usage feed.
Change
Net new agent. Two entry paths: AGT-502 Low-tier pass (scheduled) and AGT-804 usage events (bypass scheduled cycle — overage/seat signals are immediate). Churn risk gate runs first: High/Critical = suppress, Medium = CSM confirm required, Low = proceed. Open opp check before any new play — if open expansion opp exists, enrichment path only (append context, single contextual update to AM, no new play created). Five signals: consumption overage +40 pts (immediate), seat utilization >80% for 2+ months +30 pts (immediate), ConvIntelligence feature inquiry + positive sentiment +30 pts (immediate), new stakeholder engaged +20 pts (digest), cross-sell peer cohort gap +15–25 pts (digest). Peer cohort: segment × vertical × size band, minimum 5 accounts, ≥60% match rate.
L2 ripple
AM routing uses TerritoryDefinitions as authoritative account ownership source. Original spec used RepBook. Same change applied to AGT-202 Lead Router in v17. Routing conflict fallback (AM unavailable) → segment manager, same chain as AGT-202.
L8 ripple
AGT-804 usage data feed replaces generic Revenue table as source for consumption overage and seat utilization signals. AGT-804 also bypasses the scheduled daily cycle — usage threshold events trigger AGT-503 directly.
Schema adds
ExpansionLog — one row per signal detection. Fields: expansion_score, play_type (new_play / enrichment / suppressed), suppressed_reason, signal_sources (JSON), expansion_play_type, acv_estimate, open_opp_exists, enriched_opp_id, routed_to_am, alert_type, csm_confirm_required, csm_confirmed_at. Accounts extended: expansion_score, expansion_play_type, expansion_acv_potential, expansion_routed_to, expansion_last_detected.
Impact
Expansion ACV potential feeds MetricsCalc expansion revenue forecast — first agent to quantify NRR upside in the OS. Open opp check prevents duplicate plays and rep confusion — enrichment path preserves rep context without creating noise.
AGT-504L5 Customer Communications Agent
New agentSchema
Transactional comms execution: price increases, ToS changes, EoS/EoL. Dual-track: internal AM brief + external comm. Hybrid trigger model. Human approval always required.
Change
Net new agent. Execution-only — never initiates the decision to communicate. Dual-track: (1) internal brief to each AM scoped to their affected accounts, priority flags, and proactive outreach CTAs; (2) external transactional comm to appropriate contacts, gated by human approval. Always sends externally regardless of whether AM has contacted the customer first — internal brief is prep, not a replacement.
Trigger model
Hybrid. Price increases: PricingConfig scan detects future-dated rate delta — auto-initiates process, no manual entry required. EoS/EoL and ToS: manual entry to CommEventQueue (L0, RevOps-owned). AGT-504 polls CommEventQueue for pending records.
Contact logic
Price increase → billing contact on order form (AGT-801 source). ToS → billing contact + legal/signatory personas from Contacts table. EoS/EoL → all active users in Contacts table for affected accounts; falls back to billing contact if no Contacts records exist. Generic billing email detection: 1-business-day delay + AM flagged to identify human contact. Send proceeds after delay regardless of resolution. AGT-204 enrichment triggered as persistent fix.
Priority flags
Accounts flagged as priority in internal brief (requiring proactive AM outreach): churn risk tier ≥ Medium (AGT-502), renewal proximity ≤90 days, ABM-active, ACV ≥ segment P75 (configurable per event via CommEventQueue.acv_priority_threshold).
Approval gate
Human approval required before external comm sends — always. RevOps approves all comm types; Legal/Comms also required for ToS and EoS/EoL. Approval and lead time window run in parallel (both must clear). External comm never auto-sends.
Lead time
Default 3 business days between internal brief delivery and external send. Configurable per event (CommEventQueue.lead_time_days). Clock starts on brief delivery to AMs.
Schema adds
CommEventQueue (L0) — event anchor record, auto-created on PricingConfig detection or manually entered. Fields: comm_type, status, effective_date, affected_skus, change_summary, lead_time_days, acv_priority_threshold, approval fields, auto_detected flag, pricing_config_ref. CommBriefLog — one row per AM per event. CommDeliveryLog — one row per recipient per event, includes generic_email_flag, unresolved_generic_flag, delay_applied_days.
L0 dependency
PricingConfig (L0) — reads effective_date and rate_delta for price change detection and external comm content. CommEventQueue (L0) — RevOps-owned intake table for non-price comm events.
Impact
Closes the last unmanaged customer touchpoint in the OS. Price changes, product sunset, and legal notices previously had no systematic AM prep path — priority customers could receive a transactional email before their AM knew it was coming. The internal brief window eliminates that gap. CommDeliveryLog provides first audit trail for customer-facing transactional comms — useful for compliance and dispute resolution.
v19 → v20 Layer 6 — Customer Lifecycle 4 agents specced — Onboarding Orchestrator, Technical Implementation, QBR Prep, Voice of Customer. 10 new tables. MAP introduces comp gate wiring to AGT-104.
AGT-601L6 Onboarding Orchestrator
New agentSchemaL1 comp gate
PM layer for onboarding. Pre-close trigger for eligible deals. MAP sign-off gates ACV + usage comp via AGT-104. Continuous OnboardingLog writes enable AGT-501 health scoring from day one.
Change
Net new agent. Pre-close trigger fires when deal health ≥ threshold AND stage ≥ Proposal for eligible deals (Enterprise, ACV ≥ threshold, implementation services on quote, SAM ≥ threshold). Post-close trigger fires for all segments at order execution (AGT-801 signal). Pre-close outputs: onboarding plan draft, internal kickoff brief (AM + CSM), resource requirement flag, post-sales risk alert with reject gate. Active onboarding: 7 configurable milestones (M1–M7). TTV = first value event + adoption threshold both required; RevOps defines value events per product in ProductValueEvents; selling team selects + dates in MAP; inheriting team signs off or escalates. Escalation triggers: customer unresponsive (2+ attempts), TTV trajectory miss. Onboarding closes: all milestones complete + CSM manual gate.
MAP comp gate
If post-sales rejects deal handoff or MAP sign-off not obtained: AGT-601 writes exception_type = 'map_signoff_hold' to ExceptionLog. AGT-104 picks up on next daily run and holds both ACV commission and usage realization payout. Hold lifts automatically on MAP sign-off. SLM notified immediately. SLM override allows deal to proceed despite rejection.
AGT-501 integration
OnboardingLog written continuously (not a closing summary). Every milestone, CSM activity, and exec touchpoint writes a record. AGT-501 reads OnboardingLog for accounts where onboarding_status = 'active' — substitutes OnboardingLog signals for dimensions where live data doesn't exist yet. Health records flagged in_onboarding = TRUE. Exec sponsor engagement, QBR recency, usage, and seat utilization dimensions all have defined OnboardingLog substitutes. NPS/CSAT stays data_gap — no proxy, by design.
L4 ripple
Reads ConvIntelligence and OpportunityBriefLog to extract technical win details and rep commitments into MAP. Reads QuoteLog for product configuration context. These are read-only — no writes back to L4 tables.
Schema adds
MutualActionPlan — shared L4/L6 handoff artifact with comp_hold_active field wiring to AGT-104. OnboardingLog — continuous milestone + activity log. ProductValueEvents (L0, RevOps-owned) — first value event definitions per SKU. OnboardingMilestoneConfig (L0, RevOps-owned) — milestone defaults per segment + product. Accounts extended: onboarding_status, onboarding_start_date, onboarding_complete_date, ttv_achieved_days, ttv_target_days, map_id (6 fields).
Impact
Closes the institutional memory gap at handoff — technical win details, rep commitments, and customer stakeholder map transfer from selling team to post-sales systematically. Comp hold gate creates a new governance dependency — reps must ensure post-sales signs off on MAP or both ACV and usage comp are held. OnboardingLog eliminates NULL health scores for newly closed customers — AGT-501 has signals from day one.
AGT-602L6 Technical Implementation Agent
New agentSchema
Technical workstream for onboarding. Activates in parallel with AGT-601 when implementation services are on quote. IE is technical owner. MAP is shared coordination layer.
Change
Net new agent. Activation gate: QuoteLog contains implementation services line item OR ImplementationSOW record exists. Not active = AGT-601 covers M1 (technical setup) with CSM as owner using a simplified checklist. When active: runs in parallel with AGT-601 from close. Technical owner = Implementation Engineer when services purchased; CSM when not. Covers 6 technical milestones: environment provisioning, integration build, data migration, technical UAT, technical go-live sign-off, technical handoff to CSM. Reads MutualActionPlan for technical win details and customer technical stakeholder map. Writes TechnicalMilestoneLog — feeds AGT-601 M1 status. CSM sees AGT-602 status as simplified summary via AGT-601 — IE owns technical detail, CSM owns customer communication about it.
Schema adds
ImplementationSOW (L0) — SOW contract for purchased services: contracted hours, hours_consumed, roles_required, timeline, scope_summary, IE assignment, status. TechnicalMilestoneLog — per-milestone tracking with blocker_detail and customer_sign_off fields. feeds_agt601_milestone field links each technical milestone to AGT-601 M1 progression.
Impact
Gives Implementation Engineers their own agent workspace — previously no agent in the OS served this role. Also useful to SMB/MM CSMs as a technical checklist even without dedicated IE — AGT-601 exposes simplified AGT-602 view when implementation services are not purchased.
AGT-603L6 QBR Prep Agent
New agentSchema
Dual-trigger QBR prep (quarterly auto + meeting booking). 7 sections split by AM/CSM role. Internal brief + customer-facing artifact. Reads health, expansion, MAP, and renewal data.
Change
Net new agent. Dual trigger: quarterly auto (fires X days before QBR due date, default 10 business days) OR meeting booking (whichever first). Dual output: internal prep brief (all 7 sections, full context including internal-only risk and renewal language) + customer-facing summary artifact (sections 1/2/3/6/7 only — expansion ACV and churn risk language are internal-only). Brief is split by role: AM owns sections 4 (expansion) and 5 (renewal); CSM owns sections 1 (usage), 3 (health), 6 (open issues); both own sections 2 (ROI vs MAP) and 7 (next commitments). Customer artifact: CSM edits before sending — agent drafts, human always reviews.
Key data dependencies
Section 2 (ROI narrative) reads MutualActionPlan.selected_value_events — requires AGT-601 MAP to be populated. Section 4 (expansion) reads ExpansionLog — requires AGT-503 to be active. Section 5 (renewal) reads ChurnRiskLog + Opportunities. Section 3 (health scorecard) reads CustomerHealthLog — translated to executive narrative, not raw scores.
Schema adds
QBRLog — internal brief content (JSON), customer artifact content (JSON), trigger type, health score at prep, expansion signals included flag, renewal proximity days, post-QBR outcome (CSM-logged), commitments agreed (JSON).
Impact
First agent in the OS to produce a customer-facing artifact (all prior agents produce internal outputs only). QBRLog.qbr_outcome becomes a new VoC input for AGT-604. QBRLog.commitments_agreed feeds next QBR's MAP alignment check.
AGT-604L6 Voice of Customer Agent
New agentSchemaFeeds L0 + L2
Monthly deep synthesis + weekly digest across 6 signal sources. 5 consumer outputs. Closes feedback loop to PricingConfig (L0), AGT-201 ICP model, and AGT-205 MarketAssumptions.
Change
Net new agent. Six input signals: NPS verbatim + score, CSAT + comments, support ticket themes, ConvIntelligence (feature requests, objections, competitor mentions), CSM call notes + QBR outcomes, product usage patterns (adoption gap analysis). Monthly deep synthesis: full theme clustering across all sources. Weekly digest: anomaly flags and emerging themes from new signals only. Five consumer outputs: Product (feature themes, adoption gaps), RevOps (ICP refinement signals), Marketing (messaging gaps from objections), Finance/RevOps (pricing feedback for PricingConfig review), CRO (monthly org-wide executive narrative).
Feedback loops
AGT-201 ICP Scorer: ICP refinement signals flag when observed customer outcome patterns diverge from current scoring weights — input for RevOps-driven weight recalibration. AGT-205 MarketAssumptions: customer adoption patterns feed market sizing assumption updates. PricingConfig (L0): pricing objection patterns and value perception signals routed to Finance for review — VoC surfaces signal, Finance decides. AGT-702 GTM Health Monitor: customer sentiment trend included as a health signal in L7.
Schema adds
VoCSynthesisLog — per synthesis run: theme clusters (JSON), ICP refinement signals (JSON), pricing feedback (JSON), messaging gaps (JSON), CRO summary (text, monthly only). VoCSignalLog — individual signal records before synthesis. Denormalized segment + vertical fields on each record for cohort analysis without joins.
Impact
First agent in the OS to create a feedback loop from customer outcomes back to L0 policy tables and L2 scoring models. Closes the NRR loop: customer experience signals now systematically inform ICP model, pricing decisions, and market assumptions — not just reactive health monitoring. VoCSignalLog.included_in_synthesis field provides full traceability from raw signal to synthesized output.
v20 → v21 Layer 7 — Measurement & Feedback 4 agents specced — Rep performance, GTM health, win-loss/forecast accuracy, business reviews. 11 new tables.
AGT-701L7 Rep Performance & Coaching
New agentSchema
Role-parameterized coaching (AE/AM/CSM/SDR). Monthly digest + daily flag. SDR Coaching folded in as role module.
Change
Net new agent. Role-parameterized across AE, AM, CSM, and SDR — shared skeleton, role-specific metric inputs. SDR Coaching folded in as a role module (not a separate agent). Monthly digest: manager-approved before delivery to rep, contains skill gap assessment, trend commentary, and recommended next action. Daily flag: fires only on critical threshold breach (attainment <50% with ≤6 weeks remaining in period). Differentiated artifacts: rep receives coaching, manager receives pattern summary across their book.
Escalation model
Attainment <50% with ≤6 weeks remaining → SLM notified concurrently with manager. Same pattern 3+ consecutive weeks unresolved → automatic SLM escalation regardless of manager action. Manager cannot suppress the auto-escalation.
L6 new inputs
Four new coaching signals available from L6: TTV performance vs target (AGT-601 OnboardingLog), onboarding milestone velocity (AGT-601), QBR outcome quality (AGT-603 QBRLog), MAP commitment follow-through (AGT-601 MutualActionPlan). These supplement the existing ConvIntelligence objection handling and AGT-405 deal advance quality signals from L4.
Schema adds
RepSkillAssessmentLog — per-rep per-period skill gap detection across role-specific dimensions. RepCoachingLog — coaching pattern tracking, manager approval state, delivery timestamp, rep acknowledgment.
Trigger
Layer 7 spec session — net new agent. SDR Coaching previously earmarked as a separate agent; folded in as role-parameterized module to avoid redundant infrastructure.
Impact
RepCoachingLog becomes a new input for AGT-703 bias detection — sustained coaching patterns on specific deal types or verticals can surface systematic forecast bias. SDR consolidation reduces agent count from planned 40 to 39. AGT-703 gap preserves the intentional blank.
AGT-702L7 GTM Health Monitor
New agentSchema
5 canonical GTM efficiency metrics. Cadenced snapshots only. MetricsCalcLog as write target. FP&APlan versioned comparison. RevOps-first alert routing.
Change
Net new agent. Computes and persists five canonical GTM efficiency metrics on cadence: Magic Number (revenue basis — never ARR), Rule of 40 (three variants: EBITDA, FCF, operating margin), NRR (>110% target), GRR (>85% target), CAC Payback (<18 months target). All computed values written to MetricsCalcLog as the canonical persisted source — downstream agents read from MetricsCalcLog, never recompute. Snapshot model only: weekly (NRR trajectory + pipeline coverage), monthly (Magic Number, NRR/GRR actuals, R40), quarterly (CAC Payback, trailing averages). No continuous monitoring.
Plan comparison
Every MetricsCalcLog row includes four reference points: FP&APlan current version (0+12 → 1+11 → 2+10 rolling), board-approved plan (is_board_plan flag), prior period actuals from MetricsCalcLog history, AGT-402 forward forecast vs plan target.
Alert routing
RevOps only — they triage and escalate. AGT-702 never routes directly to CRO, Finance, or reps. Calibration recommendations (quota, capacity, ICP weight recalibration) surface via AGT-704 MBR synthesis — not AGT-702.
Schema adds
MetricsCalcLog — one row per metric per period per snapshot run. Fields: snapshot_date, cadence, metric_name, metric_value, plan_version, plan_target, board_plan_target, prior_period_value, pct_of_plan, data_gap, data_gap_reason. FP&APlan extended: plan_version (0+12 format), reforecast_as_of_date, is_board_plan fields added.
Trigger
Layer 7 spec session — net new agent. FP&APlan versioning extension propagates to AGT-402 and AGT-404 which now reference current plan_version for pct_of_target comparisons.
Impact
MetricsCalcLog is the first OS table to persist computed efficiency metrics as queryable data — prior period comparisons and trend analysis now available without recomputation. FP&APlan versioning enables accurate plan-vs-actual comparisons as the operating plan rolls forward through the year. AGT-703 reads MetricsCalcLog for historical metric context in forecast accuracy analysis.
AGT-703L7 Win-Loss & Forecast Accuracy
New agentSchema
Win-loss pattern analysis + systematic forecast accuracy measurement. 3 new tables. Calibration signals feed back to L1/L2/L4.
Change
Net new agent. Two analysis modes: (1) Win-loss pattern analysis on closed opportunities — loss reason clustering, competitive pattern detection, segment/vertical win rate divergence. (2) Forecast accuracy measurement — rep commit vs closed outcome, stage-weighted vs actual, AGT-402/AGT-404 forecast vs realized. Reads MetricsCalcLog for historical metric context.
Calibration signals
CalibrationSignalLog generates typed signals when analysis reveals systematic divergence: quota calibration (persistent miss/overachieve patterns → AGT-101), ICP weight recalibration (win rate divergence by segment/vertical → AGT-201), forecast model adjustment (systematic over/under-forecast → AGT-402/AGT-404), competitive KB update (new competitor patterns → AGT-403). Signals are notifications — humans decide whether to act. Dismissed signals are not re-opened; if the pattern recurs, a new signal is generated.
Rep commit accuracy
Rep-level forecast accuracy (commit vs closed) routed to AGT-701 as a coaching input. This is not a public accuracy leaderboard — it feeds the coaching engine to improve rep forecasting behavior.
Confidence gates
Minimum sample sizes enforced before pattern conclusions are drawn. Win-loss requires N closed opps per cohort (configurable, default 10). Forecast accuracy requires 2+ completed periods. Below threshold: data surfaced as directional only, not as calibration signals.
Schema adds
WinLossLog — per analysis cycle: win/loss patterns by segment, vertical, competitor, loss reason clusters. ForecastAccuracyLog — per period per forecast method: predicted vs actual, variance, directional accuracy. CalibrationSignalLog — typed calibration signals with target agent, signal strength, supporting evidence, status (pending/acknowledged/actioned/dismissed).
Trigger
Layer 7 spec session — net new agent.
Impact
Closes the loop between forecast methodology and outcomes — the OS can now systematically measure whether its own forecasting agents are accurate and surface recalibration signals when they aren't. Win-loss patterns feed AGT-403 competitive KB with structured loss reason data rather than ad-hoc rep notes. AGT-704 surfaces CalibrationSignalLog in MBR section 6 for leadership review.
AGT-704L7 Business Review Orchestrator
New agentSchema
WBR / MBR / exec QBR across three cadences. Pre-flight staleness gate. Hybrid MBR pull model. Structured doc + slides + BI payload. Action item owner routing.
Change
Net new agent. Three cadence artifacts: WBR (5 sections — weekly, FLM + SLM + RevOps full + Finance read-only on sections 1 and 5), MBR (8 sections — monthly, broader leadership), exec QBR (7 sections — quarterly, board-ready). Every cadence produces three output formats: structured document, slides, and BI payload. Pre-flight staleness gate runs before every artifact generation — if any upstream source is stale, AGT-704 flags the gap and holds until data is current or human overrides.
MBR pull model
Hybrid. WBR narrative roll-up carries forward into MBR as section 1 context. Remaining MBR sections pull fresh from source tables (MetricsCalcLog, ChurnRiskLog, ExpansionLog, CalibrationSignalLog). This avoids a full recompute while ensuring metrics are current at MBR time.
Action items
Action items auto-assigned to owners via routing rules derived from item type and content. Routing rules: metric miss → RevOps owner; rep coaching → SLM owner; customer risk → AM owner; calibration signal → owning agent team. Humans confirm before distribution — never auto-sent. Action item lifecycle tracked in ActionItemLog.
Calibration surface
AGT-703 CalibrationSignalLog is surfaced in MBR section 6. AGT-704 contextualizes calibration signals (quota, comp, ICP, forecast model adjustments) for leadership review — it does not make the recalibration recommendation autonomously. Humans decide; AGT-704 presents the case.
On-cadence only
AGT-704 stays on cadence regardless of mid-period events (major deal wins/losses, leadership changes, market events). No unscheduled exec artifacts. If an urgent situation requires a brief, that is a human-authored document — not an AGT-704 output.
Schema adds
BusinessReviewLog — one row per cadence run. Fields: cadence type, artifact versions (doc/slides/BI payload links), staleness_gate_result, upstream_sources_checked, override_by (if staleness overridden). ActionItemLog — one row per action item. Fields: source_review_id, owner, routing_rule_applied, status, due_date, confirmed_at, closed_at.
Trigger
Layer 7 spec session — net new agent.
Impact
First agent in the OS to produce board-ready artifacts — exec QBR output is designed for external consumption. ActionItemLog closes the accountability gap in business reviews — action items now have owners, due dates, and tracked resolution. Staleness gate creates a forcing function for upstream agents to stay on schedule — AGT-704 will not paper over stale data.
v24 → v25 Layer 8 — Revenue Operations 4 agents specced — Order management, billing, payment health, revenue recognition. 11 new tables. Finance-owned. UsageMeteringLog is the unified usage data source for all OS agents.
AGT-801 Order Management
New agentSchema
Accepted quote → order form → billing handoff. Two-track parallel approval. Mid-term amendments. Re-scores deal independently of AGT-406.
Change
Net new agent. Converts AGT-406 accepted quote into a legally binding order form. Order form adds to the quote: legal entity names + billing address, payment terms, auto-renewal clause + notice period, governing law, MSA reference, SOW reference (if services), commit frequency (monthly/quarterly/annual, rampable), all billable SKUs with unit prices + commitment amounts, total contract value. AGT-801 re-scores the deal (0–100, R/Y/G) against the order form terms — not inherited from AGT-406's quote score.
Approval model
Two independent tracks: (1) Pricing/deal terms: Green = auto-approve; Yellow = Finance Director; Red = Finance VP + CRO. (2) Legal terms: Green = auto-approve; Yellow = Legal Director; Red = Legal VP + CRO. Both tracks must clear independently. Pricing corrections always require Finance Director minimum regardless of score.
Amendments
Three amendment triggers: seat expansion, SKU add, pricing correction. Each generates an AmendmentLog record and re-routes through the approval chain. Original order is versioned, not overwritten.
Integration
Billing contact written to OrderLog by AGT-801 is the target for AGT-504 Customer Communications on price change events. AGT-601 reads QuoteLog + OrderLog for pre-close onboarding plan context.
Schema adds
OrderLog, OrderLineItems, OrderApprovalLog, AmendmentLog
Impact
Closes the quote → legal contract gap — the OS now covers the full commercial lifecycle from pricing proposal (AGT-406) through signed legal order form. AGT-504's billing contact routing now has a reliable data source — OrderLog is the authoritative record of who the billing contact is.
AGT-802 Billing & Invoicing
New agentSchema
Invoice generation across 3 billing types. SOW milestone billing. Credit memos and refunds with 4 trigger sources and rep submission path.
Change
Net new agent. Three billing types: recurring (seat license — monthly/quarterly/annual per contract), consumption (always monthly — reads UsageMeteringLog for units consumed vs commit, overage billed separately), SOW milestone (event-driven — fires on TechnicalMilestoneLog customer sign-off or manual Finance entry). Amendment deltas billed on amendment approval (pro-rated seat expansion, credit for pricing correction).
Credit memos + refunds
Four trigger sources: customer dispute, billing error, contract downgrade, rep submission (manager approval → Finance Director approval — two-step gate). Finance Director minimum approval on all credit events. Billing errors always require Finance Director regardless of amount.
SOW milestone gate
Customer sign-off required (TechnicalMilestoneLog.customer_sign_off = TRUE) before AGT-802 invoices. IE completion alone does not trigger billing. Manual Finance entry accepted as equivalent signal when AGT-602 is not active.
Schema adds
InvoiceLog, InvoiceLineItems, CreditMemoLog
Impact
Consumption and seat billing are now unified in the OS schema — separate invoices per billing type per account are tracked in InvoiceLog and visible to AGT-803 for payment monitoring. Rep credit submission path creates a governed channel for sales-initiated billing adjustments without Finance having to manage ad hoc requests.
AGT-803 Payment Health Monitor
New agentSchema
3 retries (10–20 min / next BD / following BD). Finance-first escalation. Failed → Finance + AM. Suspended → Finance + AM + SLM. Writes payment_health_status to CustomerHealthLog.
Change
Net new agent. Tracks payment status across all invoices. 3 retry maximum: Retry 1 at 10–20 minutes, Retry 2 next business day, Retry 3 following business day. Finance notified on Retries 1 and 2. Finance + AM notified simultaneously on Retry 3 failure. Suspended state is Finance-initiated (not automatic) — Finance + AM + SLM all notified simultaneously.
Health states
Current (no modifier) → Overdue (score capped at 77) → Failed (score capped at 62) → Suspended (score floored at Critical). States are per-account — one overdue invoice puts the account in Overdue state regardless of other invoices. Resolves to Current only when all outstanding invoices are paid.
AGT-501 feed
Writes payment_health_status to CustomerHealthLog on every status change. Non-blocking — if AGT-803 not live, AGT-501 skips the modifier and scores on behavioral dimensions alone. Modifier applied last in AGT-501 scoring sequence so cap/floor is visible as a distinct adjustment.
Schema adds
PaymentEventLog — one row per payment event (attempt, failure, retry, success, escalation)
Impact
AGT-501 payment modifier is now fully operational — specced in v19 but the feed was always a planned integration. AGT-803 makes it live. AM notification at Failed status means relationship owners now know about payment issues before customers lose access — previously this happened after Finance had already escalated externally.
AGT-804 Revenue Recognition
New agentSchema
ASC 606 recognition by SKU type. Deferred revenue schedules for accounting system. UsageMeteringLog as unified usage source for AGT-702, AGT-503, AGT-501, AGT-402.
Change
Net new agent. Three recognition methods: seat licenses = straight-line ratable (TCV ÷ term months, begins on term_start regardless of invoice payment); consumption = usage recognized as consumed per UsageMeteringLog (no deferred revenue); professional services = milestone delivery on customer sign-off (each milestone is an independent performance obligation). AGT-804 writes DeferredRevenueSchedule for accounting system journal entries — it computes the schedule, the accounting system posts journals.
UsageMeteringLog
One row per account per SKU per metering period. External product system is the data source — AGT-804 receives the feed. Multiple OS agents read from UsageMeteringLog: AGT-804 (recognition), AGT-802 (consumption invoicing), AGT-503 (expansion signals — overage +40 pts, seat utilization >80% for 2+ months +30 pts), AGT-501 (seat utilization dimension, 18 pts max). Single external integration point; all consumers downstream of it.
GTM OS feeds
AGT-702 reads RevenueRecognitionLog for recognized revenue in Magic Number calculation — not billed revenue. AGT-503 usage threshold events bypass AGT-503's scheduled daily cycle and trigger directly. AGT-501 seat utilization dimension replaces any prior usage data references with UsageMeteringLog.
Schema adds
RevenueRecognitionLog, DeferredRevenueSchedule, UsageMeteringLog
Impact
AGT-702 Magic Number now uses recognized revenue — the canonical efficiency metric is now computed on the correct basis (ASC 606 recognized, not billed). UsageMeteringLog is the single metering integration point — AGT-501 seat utilization and AGT-503 expansion signals no longer require separate product system integrations. L5/L6 ripple pending: AGT-501 and AGT-503 specs should be updated to reference UsageMeteringLog explicitly, replacing any prior generic usage data source references.
v25 (no schema change) L5 ripple + Rep Agent Builder Guide v2 AGT-501 and AGT-503 specs updated to reference UsageMeteringLog explicitly. Rep Agent Builder Guide updated for L5–L7.
AGT-501L5 Customer Health Monitor
Ripple from L8Prompt delta
Seat utilization dimension explicitly reads UsageMeteringLog. CSM call filter uses call_owner_role = 'CSM'.
Change 1
Seat utilization dimension (18 pts max) now explicitly reads UsageMeteringLog WHERE account_id = [account] AND sku_id LIKE 'SKU-SEAT%'. Prior spec referenced "AGT-804 usage data feed" without naming the table. Non-blocking — if no UsageMeteringLog records exist for the account, dimension remains data_gap as before.
Change 2
ConvIntelligence adjustment now explicitly filters call_owner_role = 'CSM' (v23 field). Selling-role calls (AE, SDR, SE) excluded from customer health adjustment — they are deal activity signals, not customer health signals.
Schema change
None. UsageMeteringLog and call_owner_role already exist. Prompt-only delta.
Impact
Seat utilization dimension is now fully operational — the data source is explicit and AGT-804 populates it. CSM call filter prevents selling-role conversations from inflating customer health scores.
AGT-503L5 Expansion Trigger Agent
Ripple from L8Prompt delta
Consumption overage and seat utilization signals explicitly read UsageMeteringLog. Immediate bypass of scheduled cycle confirmed on threshold events.
Change
Two expansion signals now explicitly source from UsageMeteringLog: (1) Consumption overage (+40 pts): UsageMeteringLog WHERE overage_units > 0 AND period_end = [most recent closed period] — immediate trigger, bypasses scheduled daily cycle. (2) Seat utilization >80% for 2+ months (+30 pts): UsageMeteringLog WHERE (units_consumed / commit_units) > 0.80 for 2+ consecutive periods — immediate trigger. AGT-804 threshold events push UsageMeteringLog writes in real-time; AGT-503 listens and fires on new records rather than waiting for the next scheduled run.
Schema change
None. UsageMeteringLog already exists. Prompt-only delta.
Impact
Usage-driven expansion signals are now fully wired end-to-end — AGT-804 receives product metering data, writes UsageMeteringLog, AGT-503 reads it and fires immediately on threshold breach. No intermediate manual step or batch delay.
Enablement Rep Agent Builder Guide v2
Update
Guide updated for L5–L7 tables. 5 new use case cards. 2 new example prompts. Schema reference extended to all layers.
Change
Rep Agent Builder Guide updated from schema v17 (L1–L4 only) to v25 (L1–L7). New use case cards: customer health alerts (AM/CSM), expansion opportunities (AM/CSM), QBR prep (AM/CSM), onboarding status (CSM/AM), coaching plan (all roles). New example prompts: customer health + renewal focus, QBR prep for specific account, coaching plan summary. Schema reference extended with L5 (CustomerHealthLog, ChurnRiskLog, ExpansionLog), L6 (OnboardingLog, MutualActionPlan, QBRLog, VoCSynthesisLog), and L7 (RepSkillAssessmentLog, RepCoachingLog, MetricsCalcLog, BusinessReviewLog, ActionItemLog) tables. New guardrail added: don't share internal QBR content externally.
Impact
AM and CSM roles now have first-class use cases in the guide — v1 was predominantly AE/SDR facing. No schema or agent behavior changes. Enablement artifact only.