TOOL-015 — Consumption-Margin Decomposer

Tier 3 specialist tool · Stateless · Per-customer GP attribution for consumption-pricing businesses · Decomposes realized margin into pricing / utilization / backend-cost / tier-mix components · Consumption-business analogue of TOOL-004
Tier 3 · Tools (Wave 4) Specced · v38 Stateless · Numerical core + LLM characterization RevOps + FP&A
Why this tool exists. Consumption-pricing businesses (per-token / per-message / per-minute / per-transaction / per-GB) have a structural property that subscription SaaS does not: realized gross margin varies materially per customer, per SKU, per backend deployment, and the path to better margin runs through workload mix, utilization density, and tier migration — not just price. AGT-503 (Expansion Trigger) and AGT-901 (Pipeline Brain) need a deterministic decomposition of where each customer's GP actually comes from to make tier-migration and pricing-strategy decisions defensibly.
Purpose

TOOL-015 takes a customer's consumption-revenue + backend-cost data over a window and decomposes the realized gross margin into structured components: list-price-vs-realized-price (discount/commit-tier impact), utilization density (multi-tenant batching, idle capacity), backend-cost mix (which compute / region / cloud is doing the work), and product-tier mix (what proportion of revenue comes from low-margin vs high-margin tiers of the catalog). Output is per-customer-quarter GP with attribution back to each component, plus a forward-looking estimate of GP uplift if specific tier-migration or workload-shaping plays land. AGT-503 calls TOOL-015 when scoring expansion-by-tier-migration plays; AGT-901 calls it for cohort-level "where is margin compressing" diagnostics.

Naming is deliberately generic: "consumption-margin" not "token-economics." The same tool serves any consumption-pricing business — per-token AI inference, per-message comms, per-minute calling, per-transaction payments, per-GB storage. Configuration captures the SKU axis; the contract doesn't.
Scope — what TOOL-015 does, what it doesn't
In scopeOut of scope
Decompose realized GP per customer-quarter into structured componentsPricing decisions (AGT-406 CPQ + pricing committee own; TOOL-015 informs)
Project GP uplift from a specific tier-migration scenario (Serverless → Dedicated → BYOC-shaped, generalized to "low-margin → high-margin tier")Executing tier migrations (AGT-503 + AGT-602 own)
Identify utilization-density opportunities (multi-tenant batching, LoRA-style adapter sharing, off-peak shifting)Implementing the underlying multi-tenant infrastructure (engineering-owned)
Surface customers with negative or near-zero realized GP (loss-leader accounts)Deciding whether to sunset a loss-leader (CRO + Finance call)
Cohort-level GP-attribution rollups (called by AGT-901 for "where is margin compressing" queries)Computing organization-level margin (AGT-702 owns)
Refusal first. If backend cost data is missing for any material portion of a customer's consumption (e.g., compute-cost feeds not piped through, region-attribution gaps), TOOL-015 returns a refusal with the gap surfaced rather than estimating. Backend-cost attribution is the load-bearing input; estimation here propagates straight into bad expansion decisions.
Input contract
{ "account_id": "uuid", "window": { // analysis window "start": "2026-01-01", "end": "2026-03-31" }, "consumption_events": [ // from UsageMeteringLog (AGT-804) { "sku": "string", // e.g. "inference_serverless_llama_70b" — opaque to tool "tier": "string", // e.g. "shared" | "dedicated" | "byoc" — semantic axis "units": float, // tokens / messages / minutes / transactions "list_price_per_unit_usd": float, "realized_price_per_unit_usd": float, // post-discount, post-commit-tier "backend_region": "string", // e.g. "us-east-1" — config axis "backend_provider": "string", // e.g. "aws" | "gcp" | "azure" | "on-prem" "backend_cost_per_unit_usd": float, // critical input — refusal trigger if missing "metered_at": "ISO 8601" } ], "tier_migration_scenarios": [ // optional — caller can request projection { "scenario_name": "string", "shifts": [ { "from_tier": "string", "to_tier": "string", "fraction": float // 0..1 of current from-tier volume } ] } ], "include_workload_shaping_recommendations": bool, // optional — surface utilization plays "comparison_cohort_window": { // optional — for QoQ / YoY framing "start": "ISO 8601", "end": "ISO 8601" } }
Output contract
{ "account_id": "uuid", "window": { "start": "...", "end": "..." }, "realized_gp": { "revenue_usd": float, // sum(realized_price × units) "backend_cost_usd": float, // sum(backend_cost × units) "gp_usd": float, // revenue − backend_cost "gp_pct": float // gp_usd / revenue_usd }, "decomposition": { "by_pricing_axis": { // discount/commit-tier impact "list_price_revenue_usd": float, "realized_price_revenue_usd": float, "price_realization_pct": float, // realized / list "commit_tier_discount_usd": float // dollars left on the table for volume tier }, "by_utilization_axis": { // multi-tenant density "effective_unit_cost_usd": float, // backend_cost / units "benchmark_unit_cost_usd": float, // best-in-class for this SKU+region "utilization_gap_usd": float, // (effective − benchmark) × units "utilization_gap_classification": "string" // "tight" | "moderate" | "loose" }, "by_backend_axis": { // which compute is doing the work "spend_by_region": {"region": float, ...}, "spend_by_provider": {"provider": float, ...}, "highest_cost_region": "string", "lowest_cost_region": "string", "region_arbitrage_gp_usd": float // GP gain if all volume shifted to lowest-cost region }, "by_tier_axis": { // product-tier mix (the margin-expansion lever) "spend_by_tier": {"tier": float, ...}, "gp_by_tier": {"tier": float, ...}, "weighted_tier_gp_pct": float, "current_tier_mix_label": "string" // "low-margin-heavy" | "balanced" | "high-margin-heavy" } }, "tier_migration_projections": [ // populated when scenarios provided { "scenario_name": "string", "projected_revenue_usd": float, "projected_backend_cost_usd": float, "projected_gp_usd": float, "projected_gp_pct": float, "gp_uplift_usd_vs_current": float, "gp_uplift_pct_vs_current": float, "switching_cost_class": "string", // "low" | "medium" | "high" | "very_high" "credible_alternative": "string" // brain-discipline: explicitly state the case for NOT migrating } ], "workload_shaping_recommendations": [ // populated when requested { "play_class": "string", // "off-peak shift" | "multi-tenant pack" | "adapter sharing" | "right-size tier" "projected_gp_uplift_usd": float, "implementation_effort_class": "string", "preconditions": [...] } ], "comparison_cohort": { // populated when comparison_cohort_window provided "prior_gp_usd": float, "prior_gp_pct": float, "delta_gp_usd": float, "delta_gp_pct_points": float, "primary_driver_of_delta": "string" }, "confidence_flags": { "backend_cost_coverage_pct": float, // % of consumption_events with backend_cost data "benchmark_data_freshness_days": int, "overall_confidence": "high" | "medium" | "low" | "refusal" }, "refusal": { // populated when overall_confidence = refusal "reason": "string", "missing_inputs": [...], "recommended_remediation": "string" } }
Computation model — deterministic core + LLM characterization

Same shape as TOOL-004 (consumption forecaster): the numerical decomposition is in code; the LLM portion produces the structured characterization labels and the workload-shaping play recommendations.

ComponentImplementationWhy deterministic vs LLM
Realized GP arithmeticCodeSum / divide; no judgment.
Pricing-axis decomposition (discount, commit tier)CodeReads list_price vs realized_price; trivial arithmetic.
Utilization gap classification (tight / moderate / loose)Code with rule thresholds; LLM optional reframeThreshold-based classification is the discipline; LLM can rephrase for narrative but must not change the threshold.
Region arbitrage projectionCodeMin-cost-region projection is straight optimization.
Tier-migration scenario projectionCodeScenario is fully specified by caller; tool applies given shifts and re-computes GP. No estimation.
"current_tier_mix_label" classificationCode with rule thresholdsThreshold-based.
workload_shaping_recommendations — play class + preconditionsLLM (Haiku)Pattern-matching on cohort + workload shape to a play class is judgment work; the GP uplift number it carries is computed deterministically.
credible_alternative on each tier-migration scenarioLLM (Haiku)Brain-discipline mirror: the tool is required to articulate why NOT migrating could be correct, even when the math says yes. Forces honest framing for downstream AGT-503 / AGT-901.
Hard rule: tool never invents tier-migration scenarios. If the caller provides scenarios, tool projects them. If the caller does not, tool returns decomposition only. The "what if we migrated 30% of Serverless to Dedicated" question is the caller's framing, not the tool's. This bounds the tool's reasoning surface and prevents drift.
Callers
CallerUse caseFrequency
AGT-503 Expansion TriggerScore tier-migration plays. AGT-503 reads UsageMeteringLog, identifies tier-migration candidates, calls TOOL-015 with the candidate's recent window + a default migration scenario. GP uplift estimate informs play prioritization.Per candidate evaluation — daily batch on overage signals.
AGT-901 Pipeline BrainCohort GP-attribution diagnosis. "Where is segment-X margin compressing?" — AGT-901 fans out to TOOL-015 across the cohort, aggregates the decomposition.Operator-invoked, ad-hoc.
AGT-902 Account BrainPer-account margin synthesis. "What's the GP picture on Acme HR, and what's the highest-leverage margin lever?" — AGT-902 calls TOOL-015, integrates output into the per-account composite narrative.Operator-invoked, per account.
AGT-903 Strategy BrainCohort-aggregated GP attribution for strategic retrospectives ("did the consumption-pricing pivot work?", "is whale-customer GP eroding?"). Multi-quarter window, called via fan-out across endorsed cohorts.Quarterly + ad-hoc strategic batches.
RevOps directWorkspace exploration: filter to a customer, get the decomposition card.Ad-hoc.
Cost + latency profile
TacticImplementation
Default modelClaude Haiku for the LLM portion (workload-shaping recommendations + credible_alternative). Sonnet only when caller passes deep_analysis: true for cohort-aggregated calls from AGT-903.
Per-call cost (Haiku)~1.5K input + 500 output ≈ $0.005 per call. Same scale as TOOL-004 / TOOL-008.
Per-call cost (Sonnet, deep mode)~$0.05–0.10 per call. Reserved for AGT-903 cohort fan-outs.
Latency target< 800ms p95 for Haiku-mode. Cohort fan-outs from AGT-903 batched in parallel up to 20-wide.
CachingStateless; no caching at tool level. Caller (AGT-503 daily batch) caches per-account-quarter results.
Eval criteria
CriterionMeasurementPass threshold
Decomposition arithmetic correctnessSynthetic fixtures with known answers; check sum-of-parts = whole within rounding tolerance100% — hard requirement
Refusal-on-missing-cost-dataFixtures with backend_cost_per_unit missing for > 5% of events: tool must refuse, not estimate100% — hard requirement
Tier-migration scenario projection consistencySame scenario re-run on same input produces same projection (modulo LLM-portion variance on credible_alternative)Numerical components 100% identical; LLM components within Levenshtein distance threshold
credible_alternative qualityFor each propose-tier-migration scenario: brain-prompt eval verifies a credible "case for NOT migrating" is articulated, not a wave-of-the-hand≥ 90% pass on subjective rubric (manual review batch)
Workload-shaping play classification correctness10 fixtures with known optimal play class: tool's primary recommendation matches≥ 80%
Cohort-aggregation correctnessSum across N customer calls equals single aggregate-mode call within rounding100%
Failure modes
SymptomLikely causeAction
Tool projects GP uplift on tier-migration that doesn't materializeSwitching cost class underestimated; customer migrated but lost faster than projected; backend-cost benchmark staleAGT-503 retrospective on accepted plays. Adjust switching_cost_class rubric. Refresh benchmark_unit_cost_usd inputs.
Tool refuses on every call for a customer cohortbackend_cost feeds not piped through for that cohort (e.g., new region, new cloud provider)RevOps escalates to FinOps / data-engineering; tool's refusal is the correct surfacing.
credible_alternative is generic / wave-of-the-handLLM portion not seeing enough context; system prompt insufficientTighten system prompt with few-shot examples from approved AGT-503 plays. Eval rubric catches this.
workload_shaping_recommendations don't survive engineering reviewTool overweighting play classes that aren't actually deployable on current infrastructurePre-conditions field is the gate; engineering reviews preconditions, marks unsupported play classes as gated. Tool reads gate config on next call.