TOOL-015 — Consumption-Margin Decomposer
Tier 3 specialist tool · Stateless · Per-customer GP attribution for consumption-pricing businesses · Decomposes realized margin into pricing / utilization / backend-cost / tier-mix components · Consumption-business analogue of TOOL-004
Tier 3 · Tools (Wave 4)
Specced · v38
Stateless · Numerical core + LLM characterization
RevOps + FP&A
Why this tool exists. Consumption-pricing businesses (per-token / per-message / per-minute / per-transaction / per-GB) have a structural property that subscription SaaS does not: realized gross margin varies materially per customer, per SKU, per backend deployment, and the path to better margin runs through workload mix, utilization density, and tier migration — not just price. AGT-503 (Expansion Trigger) and AGT-901 (Pipeline Brain) need a deterministic decomposition of where each customer's GP actually comes from to make tier-migration and pricing-strategy decisions defensibly.
Purpose
TOOL-015 takes a customer's consumption-revenue + backend-cost data over a window and decomposes the realized gross margin into structured components: list-price-vs-realized-price (discount/commit-tier impact), utilization density (multi-tenant batching, idle capacity), backend-cost mix (which compute / region / cloud is doing the work), and product-tier mix (what proportion of revenue comes from low-margin vs high-margin tiers of the catalog). Output is per-customer-quarter GP with attribution back to each component, plus a forward-looking estimate of GP uplift if specific tier-migration or workload-shaping plays land. AGT-503 calls TOOL-015 when scoring expansion-by-tier-migration plays; AGT-901 calls it for cohort-level "where is margin compressing" diagnostics.
Naming is deliberately generic: "consumption-margin" not "token-economics." The same tool serves any consumption-pricing business — per-token AI inference, per-message comms, per-minute calling, per-transaction payments, per-GB storage. Configuration captures the SKU axis; the contract doesn't.
Scope — what TOOL-015 does, what it doesn't
| In scope | Out of scope |
| Decompose realized GP per customer-quarter into structured components | Pricing decisions (AGT-406 CPQ + pricing committee own; TOOL-015 informs) |
| Project GP uplift from a specific tier-migration scenario (Serverless → Dedicated → BYOC-shaped, generalized to "low-margin → high-margin tier") | Executing tier migrations (AGT-503 + AGT-602 own) |
| Identify utilization-density opportunities (multi-tenant batching, LoRA-style adapter sharing, off-peak shifting) | Implementing the underlying multi-tenant infrastructure (engineering-owned) |
| Surface customers with negative or near-zero realized GP (loss-leader accounts) | Deciding whether to sunset a loss-leader (CRO + Finance call) |
| Cohort-level GP-attribution rollups (called by AGT-901 for "where is margin compressing" queries) | Computing organization-level margin (AGT-702 owns) |
Refusal first. If backend cost data is missing for any material portion of a customer's consumption (e.g., compute-cost feeds not piped through, region-attribution gaps), TOOL-015 returns a refusal with the gap surfaced rather than estimating. Backend-cost attribution is the load-bearing input; estimation here propagates straight into bad expansion decisions.
Input contract
{
"account_id": "uuid",
"window": { // analysis window
"start": "2026-01-01",
"end": "2026-03-31"
},
"consumption_events": [ // from UsageMeteringLog (AGT-804)
{
"sku": "string", // e.g. "inference_serverless_llama_70b" — opaque to tool
"tier": "string", // e.g. "shared" | "dedicated" | "byoc" — semantic axis
"units": float, // tokens / messages / minutes / transactions
"list_price_per_unit_usd": float,
"realized_price_per_unit_usd": float, // post-discount, post-commit-tier
"backend_region": "string", // e.g. "us-east-1" — config axis
"backend_provider": "string", // e.g. "aws" | "gcp" | "azure" | "on-prem"
"backend_cost_per_unit_usd": float, // critical input — refusal trigger if missing
"metered_at": "ISO 8601"
}
],
"tier_migration_scenarios": [ // optional — caller can request projection
{
"scenario_name": "string",
"shifts": [
{
"from_tier": "string",
"to_tier": "string",
"fraction": float // 0..1 of current from-tier volume
}
]
}
],
"include_workload_shaping_recommendations": bool, // optional — surface utilization plays
"comparison_cohort_window": { // optional — for QoQ / YoY framing
"start": "ISO 8601",
"end": "ISO 8601"
}
}
Output contract
{
"account_id": "uuid",
"window": { "start": "...", "end": "..." },
"realized_gp": {
"revenue_usd": float, // sum(realized_price × units)
"backend_cost_usd": float, // sum(backend_cost × units)
"gp_usd": float, // revenue − backend_cost
"gp_pct": float // gp_usd / revenue_usd
},
"decomposition": {
"by_pricing_axis": { // discount/commit-tier impact
"list_price_revenue_usd": float,
"realized_price_revenue_usd": float,
"price_realization_pct": float, // realized / list
"commit_tier_discount_usd": float // dollars left on the table for volume tier
},
"by_utilization_axis": { // multi-tenant density
"effective_unit_cost_usd": float, // backend_cost / units
"benchmark_unit_cost_usd": float, // best-in-class for this SKU+region
"utilization_gap_usd": float, // (effective − benchmark) × units
"utilization_gap_classification": "string" // "tight" | "moderate" | "loose"
},
"by_backend_axis": { // which compute is doing the work
"spend_by_region": {"region": float, ...},
"spend_by_provider": {"provider": float, ...},
"highest_cost_region": "string",
"lowest_cost_region": "string",
"region_arbitrage_gp_usd": float // GP gain if all volume shifted to lowest-cost region
},
"by_tier_axis": { // product-tier mix (the margin-expansion lever)
"spend_by_tier": {"tier": float, ...},
"gp_by_tier": {"tier": float, ...},
"weighted_tier_gp_pct": float,
"current_tier_mix_label": "string" // "low-margin-heavy" | "balanced" | "high-margin-heavy"
}
},
"tier_migration_projections": [ // populated when scenarios provided
{
"scenario_name": "string",
"projected_revenue_usd": float,
"projected_backend_cost_usd": float,
"projected_gp_usd": float,
"projected_gp_pct": float,
"gp_uplift_usd_vs_current": float,
"gp_uplift_pct_vs_current": float,
"switching_cost_class": "string", // "low" | "medium" | "high" | "very_high"
"credible_alternative": "string" // brain-discipline: explicitly state the case for NOT migrating
}
],
"workload_shaping_recommendations": [ // populated when requested
{
"play_class": "string", // "off-peak shift" | "multi-tenant pack" | "adapter sharing" | "right-size tier"
"projected_gp_uplift_usd": float,
"implementation_effort_class": "string",
"preconditions": [...]
}
],
"comparison_cohort": { // populated when comparison_cohort_window provided
"prior_gp_usd": float,
"prior_gp_pct": float,
"delta_gp_usd": float,
"delta_gp_pct_points": float,
"primary_driver_of_delta": "string"
},
"confidence_flags": {
"backend_cost_coverage_pct": float, // % of consumption_events with backend_cost data
"benchmark_data_freshness_days": int,
"overall_confidence": "high" | "medium" | "low" | "refusal"
},
"refusal": { // populated when overall_confidence = refusal
"reason": "string",
"missing_inputs": [...],
"recommended_remediation": "string"
}
}
Computation model — deterministic core + LLM characterization
Same shape as TOOL-004 (consumption forecaster): the numerical decomposition is in code; the LLM portion produces the structured characterization labels and the workload-shaping play recommendations.
| Component | Implementation | Why deterministic vs LLM |
| Realized GP arithmetic | Code | Sum / divide; no judgment. |
| Pricing-axis decomposition (discount, commit tier) | Code | Reads list_price vs realized_price; trivial arithmetic. |
| Utilization gap classification (tight / moderate / loose) | Code with rule thresholds; LLM optional reframe | Threshold-based classification is the discipline; LLM can rephrase for narrative but must not change the threshold. |
| Region arbitrage projection | Code | Min-cost-region projection is straight optimization. |
| Tier-migration scenario projection | Code | Scenario is fully specified by caller; tool applies given shifts and re-computes GP. No estimation. |
| "current_tier_mix_label" classification | Code with rule thresholds | Threshold-based. |
| workload_shaping_recommendations — play class + preconditions | LLM (Haiku) | Pattern-matching on cohort + workload shape to a play class is judgment work; the GP uplift number it carries is computed deterministically. |
| credible_alternative on each tier-migration scenario | LLM (Haiku) | Brain-discipline mirror: the tool is required to articulate why NOT migrating could be correct, even when the math says yes. Forces honest framing for downstream AGT-503 / AGT-901. |
Hard rule: tool never invents tier-migration scenarios. If the caller provides scenarios, tool projects them. If the caller does not, tool returns decomposition only. The "what if we migrated 30% of Serverless to Dedicated" question is the caller's framing, not the tool's. This bounds the tool's reasoning surface and prevents drift.
Callers
| Caller | Use case | Frequency |
| AGT-503 Expansion Trigger | Score tier-migration plays. AGT-503 reads UsageMeteringLog, identifies tier-migration candidates, calls TOOL-015 with the candidate's recent window + a default migration scenario. GP uplift estimate informs play prioritization. | Per candidate evaluation — daily batch on overage signals. |
| AGT-901 Pipeline Brain | Cohort GP-attribution diagnosis. "Where is segment-X margin compressing?" — AGT-901 fans out to TOOL-015 across the cohort, aggregates the decomposition. | Operator-invoked, ad-hoc. |
| AGT-902 Account Brain | Per-account margin synthesis. "What's the GP picture on Acme HR, and what's the highest-leverage margin lever?" — AGT-902 calls TOOL-015, integrates output into the per-account composite narrative. | Operator-invoked, per account. |
| AGT-903 Strategy Brain | Cohort-aggregated GP attribution for strategic retrospectives ("did the consumption-pricing pivot work?", "is whale-customer GP eroding?"). Multi-quarter window, called via fan-out across endorsed cohorts. | Quarterly + ad-hoc strategic batches. |
| RevOps direct | Workspace exploration: filter to a customer, get the decomposition card. | Ad-hoc. |
Cost + latency profile
| Tactic | Implementation |
| Default model | Claude Haiku for the LLM portion (workload-shaping recommendations + credible_alternative). Sonnet only when caller passes deep_analysis: true for cohort-aggregated calls from AGT-903. |
| Per-call cost (Haiku) | ~1.5K input + 500 output ≈ $0.005 per call. Same scale as TOOL-004 / TOOL-008. |
| Per-call cost (Sonnet, deep mode) | ~$0.05–0.10 per call. Reserved for AGT-903 cohort fan-outs. |
| Latency target | < 800ms p95 for Haiku-mode. Cohort fan-outs from AGT-903 batched in parallel up to 20-wide. |
| Caching | Stateless; no caching at tool level. Caller (AGT-503 daily batch) caches per-account-quarter results. |
Eval criteria
| Criterion | Measurement | Pass threshold |
| Decomposition arithmetic correctness | Synthetic fixtures with known answers; check sum-of-parts = whole within rounding tolerance | 100% — hard requirement |
| Refusal-on-missing-cost-data | Fixtures with backend_cost_per_unit missing for > 5% of events: tool must refuse, not estimate | 100% — hard requirement |
| Tier-migration scenario projection consistency | Same scenario re-run on same input produces same projection (modulo LLM-portion variance on credible_alternative) | Numerical components 100% identical; LLM components within Levenshtein distance threshold |
| credible_alternative quality | For each propose-tier-migration scenario: brain-prompt eval verifies a credible "case for NOT migrating" is articulated, not a wave-of-the-hand | ≥ 90% pass on subjective rubric (manual review batch) |
| Workload-shaping play classification correctness | 10 fixtures with known optimal play class: tool's primary recommendation matches | ≥ 80% |
| Cohort-aggregation correctness | Sum across N customer calls equals single aggregate-mode call within rounding | 100% |
Failure modes
| Symptom | Likely cause | Action |
| Tool projects GP uplift on tier-migration that doesn't materialize | Switching cost class underestimated; customer migrated but lost faster than projected; backend-cost benchmark stale | AGT-503 retrospective on accepted plays. Adjust switching_cost_class rubric. Refresh benchmark_unit_cost_usd inputs. |
| Tool refuses on every call for a customer cohort | backend_cost feeds not piped through for that cohort (e.g., new region, new cloud provider) | RevOps escalates to FinOps / data-engineering; tool's refusal is the correct surfacing. |
| credible_alternative is generic / wave-of-the-hand | LLM portion not seeing enough context; system prompt insufficient | Tighten system prompt with few-shot examples from approved AGT-503 plays. Eval rubric catches this. |
| workload_shaping_recommendations don't survive engineering review | Tool overweighting play classes that aren't actually deployable on current infrastructure | Pre-conditions field is the gate; engineering reviews preconditions, marks unsupported play classes as gated. Tool reads gate config on next call. |