TOOL-015 — Consumption-Margin Decomposer

Tier 3 specialist tool · Stateless · Per-customer GP attribution for consumption-pricing businesses · Decomposes realized margin into pricing / utilization / backend-cost / tier-mix components · Consumption-business analogue of TOOL-004

Tier 3 · Tools (Wave 4) Specced · v38 Stateless · Numerical core + LLM characterization RevOps + FP&A

Why this tool exists. Consumption-pricing businesses (per-token / per-message / per-minute / per-transaction / per-GB) have a structural property that subscription SaaS does not: realized gross margin varies materially per customer, per SKU, per backend deployment, and the path to better margin runs through workload mix, utilization density, and tier migration — not just price. AGT-503 (Expansion Trigger) and AGT-901 (Pipeline Brain) need a deterministic decomposition of where each customer's GP actually comes from to make tier-migration and pricing-strategy decisions defensibly.

Purpose

TOOL-015 takes a customer's consumption-revenue + backend-cost data over a window and decomposes the realized gross margin into structured components: list-price-vs-realized-price (discount/commit-tier impact), utilization density (multi-tenant batching, idle capacity), backend-cost mix (which compute / region / cloud is doing the work), and product-tier mix (what proportion of revenue comes from low-margin vs high-margin tiers of the catalog). Output is per-customer-quarter GP with attribution back to each component, plus a forward-looking estimate of GP uplift if specific tier-migration or workload-shaping plays land. AGT-503 calls TOOL-015 when scoring expansion-by-tier-migration plays; AGT-901 calls it for cohort-level "where is margin compressing" diagnostics.

Naming is deliberately generic: "consumption-margin" not "token-economics." The same tool serves any consumption-pricing business — per-token AI inference, per-message comms, per-minute calling, per-transaction payments, per-GB storage. Configuration captures the SKU axis; the contract doesn't.

Scope — what TOOL-015 does, what it doesn't

In scope	Out of scope
Decompose realized GP per customer-quarter into structured components	Pricing decisions (AGT-406 CPQ + pricing committee own; TOOL-015 informs)
Project GP uplift from a specific tier-migration scenario (Serverless → Dedicated → BYOC-shaped, generalized to "low-margin → high-margin tier")	Executing tier migrations (AGT-503 + AGT-602 own)
Identify utilization-density opportunities (multi-tenant batching, LoRA-style adapter sharing, off-peak shifting)	Implementing the underlying multi-tenant infrastructure (engineering-owned)
Surface customers with negative or near-zero realized GP (loss-leader accounts)	Deciding whether to sunset a loss-leader (CRO + Finance call)
Cohort-level GP-attribution rollups (called by AGT-901 for "where is margin compressing" queries)	Computing organization-level margin (AGT-702 owns)

Refusal first. If backend cost data is missing for any material portion of a customer's consumption (e.g., compute-cost feeds not piped through, region-attribution gaps), TOOL-015 returns a refusal with the gap surfaced rather than estimating. Backend-cost attribution is the load-bearing input; estimation here propagates straight into bad expansion decisions.

Input contract

{
  "account_id": "uuid",
  "window": {                              // analysis window
    "start": "2026-01-01",
    "end":   "2026-03-31"
  },
  "consumption_events": [                  // from UsageMeteringLog (AGT-804)
    {
      "sku": "string",                     // e.g. "inference_serverless_llama_70b" — opaque to tool
      "tier": "string",                    // e.g. "shared" | "dedicated" | "byoc" — semantic axis
      "units": float,                      // tokens / messages / minutes / transactions
      "list_price_per_unit_usd": float,
      "realized_price_per_unit_usd": float, // post-discount, post-commit-tier
      "backend_region": "string",          // e.g. "us-east-1" — config axis
      "backend_provider": "string",        // e.g. "aws" | "gcp" | "azure" | "on-prem"
      "backend_cost_per_unit_usd": float,  // critical input — refusal trigger if missing
      "metered_at": "ISO 8601"
    }
  ],
  "tier_migration_scenarios": [            // optional — caller can request projection
    {
      "scenario_name": "string",
      "shifts": [
        {
          "from_tier": "string",
          "to_tier":   "string",
          "fraction":  float                // 0..1 of current from-tier volume
        }
      ]
    }
  ],
  "include_workload_shaping_recommendations": bool,  // optional — surface utilization plays
  "comparison_cohort_window": {            // optional — for QoQ / YoY framing
    "start": "ISO 8601",
    "end":   "ISO 8601"
  }
}

Output contract

{
  "account_id": "uuid",
  "window": { "start": "...", "end": "..." },
  "realized_gp": {
    "revenue_usd": float,                  // sum(realized_price × units)
    "backend_cost_usd": float,             // sum(backend_cost × units)
    "gp_usd": float,                       // revenue − backend_cost
    "gp_pct": float                        // gp_usd / revenue_usd
  },
  "decomposition": {
    "by_pricing_axis": {                   // discount/commit-tier impact
      "list_price_revenue_usd": float,
      "realized_price_revenue_usd": float,
      "price_realization_pct": float,      // realized / list
      "commit_tier_discount_usd": float    // dollars left on the table for volume tier
    },
    "by_utilization_axis": {               // multi-tenant density
      "effective_unit_cost_usd": float,    // backend_cost / units
      "benchmark_unit_cost_usd": float,    // best-in-class for this SKU+region
      "utilization_gap_usd": float,        // (effective − benchmark) × units
      "utilization_gap_classification": "string"   // "tight" | "moderate" | "loose"
    },
    "by_backend_axis": {                   // which compute is doing the work
      "spend_by_region":   {"region": float, ...},
      "spend_by_provider": {"provider": float, ...},
      "highest_cost_region": "string",
      "lowest_cost_region": "string",
      "region_arbitrage_gp_usd": float     // GP gain if all volume shifted to lowest-cost region
    },
    "by_tier_axis": {                      // product-tier mix (the margin-expansion lever)
      "spend_by_tier": {"tier": float, ...},
      "gp_by_tier":    {"tier": float, ...},
      "weighted_tier_gp_pct": float,
      "current_tier_mix_label": "string"   // "low-margin-heavy" | "balanced" | "high-margin-heavy"
    }
  },
  "tier_migration_projections": [          // populated when scenarios provided
    {
      "scenario_name": "string",
      "projected_revenue_usd": float,
      "projected_backend_cost_usd": float,
      "projected_gp_usd": float,
      "projected_gp_pct": float,
      "gp_uplift_usd_vs_current": float,
      "gp_uplift_pct_vs_current": float,
      "switching_cost_class": "string",    // "low" | "medium" | "high" | "very_high"
      "credible_alternative":  "string"    // brain-discipline: explicitly state the case for NOT migrating
    }
  ],
  "workload_shaping_recommendations": [    // populated when requested
    {
      "play_class": "string",              // "off-peak shift" | "multi-tenant pack" | "adapter sharing" | "right-size tier"
      "projected_gp_uplift_usd": float,
      "implementation_effort_class": "string",
      "preconditions": [...]
    }
  ],
  "comparison_cohort": {                   // populated when comparison_cohort_window provided
    "prior_gp_usd": float,
    "prior_gp_pct": float,
    "delta_gp_usd": float,
    "delta_gp_pct_points": float,
    "primary_driver_of_delta": "string"
  },
  "confidence_flags": {
    "backend_cost_coverage_pct": float,    // % of consumption_events with backend_cost data
    "benchmark_data_freshness_days": int,
    "overall_confidence": "high" | "medium" | "low" | "refusal"
  },
  "refusal": {                              // populated when overall_confidence = refusal
    "reason": "string",
    "missing_inputs": [...],
    "recommended_remediation": "string"
  }
}

Computation model — deterministic core + LLM characterization

Same shape as TOOL-004 (consumption forecaster): the numerical decomposition is in code; the LLM portion produces the structured characterization labels and the workload-shaping play recommendations.

Component	Implementation	Why deterministic vs LLM
Realized GP arithmetic	Code	Sum / divide; no judgment.
Pricing-axis decomposition (discount, commit tier)	Code	Reads list_price vs realized_price; trivial arithmetic.
Utilization gap classification (tight / moderate / loose)	Code with rule thresholds; LLM optional reframe	Threshold-based classification is the discipline; LLM can rephrase for narrative but must not change the threshold.
Region arbitrage projection	Code	Min-cost-region projection is straight optimization.
Tier-migration scenario projection	Code	Scenario is fully specified by caller; tool applies given shifts and re-computes GP. No estimation.
"current_tier_mix_label" classification	Code with rule thresholds	Threshold-based.
workload_shaping_recommendations — play class + preconditions	LLM (Haiku)	Pattern-matching on cohort + workload shape to a play class is judgment work; the GP uplift number it carries is computed deterministically.
credible_alternative on each tier-migration scenario	LLM (Haiku)	Brain-discipline mirror: the tool is required to articulate why NOT migrating could be correct, even when the math says yes. Forces honest framing for downstream AGT-503 / AGT-901.

Hard rule: tool never invents tier-migration scenarios. If the caller provides scenarios, tool projects them. If the caller does not, tool returns decomposition only. The "what if we migrated 30% of Serverless to Dedicated" question is the caller's framing, not the tool's. This bounds the tool's reasoning surface and prevents drift.

Callers

Caller	Use case	Frequency
AGT-503 Expansion Trigger	Score tier-migration plays. AGT-503 reads UsageMeteringLog, identifies tier-migration candidates, calls TOOL-015 with the candidate's recent window + a default migration scenario. GP uplift estimate informs play prioritization.	Per candidate evaluation — daily batch on overage signals.
AGT-901 Pipeline Brain	Cohort GP-attribution diagnosis. "Where is segment-X margin compressing?" — AGT-901 fans out to TOOL-015 across the cohort, aggregates the decomposition.	Operator-invoked, ad-hoc.
AGT-902 Account Brain	Per-account margin synthesis. "What's the GP picture on Acme HR, and what's the highest-leverage margin lever?" — AGT-902 calls TOOL-015, integrates output into the per-account composite narrative.	Operator-invoked, per account.
AGT-903 Strategy Brain	Cohort-aggregated GP attribution for strategic retrospectives ("did the consumption-pricing pivot work?", "is whale-customer GP eroding?"). Multi-quarter window, called via fan-out across endorsed cohorts.	Quarterly + ad-hoc strategic batches.
RevOps direct	Workspace exploration: filter to a customer, get the decomposition card.	Ad-hoc.

Cost + latency profile

Tactic	Implementation
Default model	Claude Haiku for the LLM portion (workload-shaping recommendations + credible_alternative). Sonnet only when caller passes `deep_analysis: true` for cohort-aggregated calls from AGT-903.
Per-call cost (Haiku)	~1.5K input + 500 output ≈ $0.005 per call. Same scale as TOOL-004 / TOOL-008.
Per-call cost (Sonnet, deep mode)	~$0.05–0.10 per call. Reserved for AGT-903 cohort fan-outs.
Latency target	< 800ms p95 for Haiku-mode. Cohort fan-outs from AGT-903 batched in parallel up to 20-wide.
Caching	Stateless; no caching at tool level. Caller (AGT-503 daily batch) caches per-account-quarter results.

Eval criteria

Criterion	Measurement	Pass threshold
Decomposition arithmetic correctness	Synthetic fixtures with known answers; check sum-of-parts = whole within rounding tolerance	100% — hard requirement
Refusal-on-missing-cost-data	Fixtures with backend_cost_per_unit missing for > 5% of events: tool must refuse, not estimate	100% — hard requirement
Tier-migration scenario projection consistency	Same scenario re-run on same input produces same projection (modulo LLM-portion variance on credible_alternative)	Numerical components 100% identical; LLM components within Levenshtein distance threshold
credible_alternative quality	For each propose-tier-migration scenario: brain-prompt eval verifies a credible "case for NOT migrating" is articulated, not a wave-of-the-hand	≥ 90% pass on subjective rubric (manual review batch)
Workload-shaping play classification correctness	10 fixtures with known optimal play class: tool's primary recommendation matches	≥ 80%
Cohort-aggregation correctness	Sum across N customer calls equals single aggregate-mode call within rounding	100%

Failure modes

Symptom	Likely cause	Action
Tool projects GP uplift on tier-migration that doesn't materialize	Switching cost class underestimated; customer migrated but lost faster than projected; backend-cost benchmark stale	AGT-503 retrospective on accepted plays. Adjust switching_cost_class rubric. Refresh benchmark_unit_cost_usd inputs.
Tool refuses on every call for a customer cohort	backend_cost feeds not piped through for that cohort (e.g., new region, new cloud provider)	RevOps escalates to FinOps / data-engineering; tool's refusal is the correct surfacing.
credible_alternative is generic / wave-of-the-hand	LLM portion not seeing enough context; system prompt insufficient	Tighten system prompt with few-shot examples from approved AGT-503 plays. Eval rubric catches this.
workload_shaping_recommendations don't survive engineering review	Tool overweighting play classes that aren't actually deployable on current infrastructure	Pre-conditions field is the gate; engineering reviews preconditions, marks unsupported play classes as gated. Tool reads gate config on next call.

TOOL-015 Consumption-Margin Decomposer · Tier 3 specialist · Wave 4 · Schema v38 · RevOps + FP&A owned
Status: Specced v38, build deferred (synth corpus extension + AGT-503 tier-migration spec land must complete first).
Called by: AGT-503 Expansion Trigger (primary, daily batch) · AGT-901 Pipeline Brain · AGT-902 Account Brain · AGT-903 Strategy Brain (cohort fan-out) · RevOps direct.
Reads input from caller (no direct table reads). Caller assembles consumption_events from UsageMeteringLog + backend cost feeds.
Closes the consumption-business GP-attribution gap that AGT-503 tier-migration plays and AGT-903 strategic-margin retrospectives need. Deterministic core + Haiku LLM characterization.
Companion: TOOL-004 Consumption Forecasting (forecasts volume; TOOL-015 attributes margin) · TOOL-014 Segment-LTV Decomposer (LTV decomposition; TOOL-015 GP decomposition).