Cost Spike Runbook

Steps to diagnose and reduce LLM cost spikes quickly and safely.
Published:
Admin User
Updated:
published

Cost Spike Runbook

  • Freeze changes and routing updates
  • Identify top callers and token drivers
  • Apply budgets/rate limits/caching
  • Verify cost per task returns to baseline
  • Capture evidence and update controls

Related

Cost Spike Control (LLMOps) Cost per Task AI Rollback Runbook

FAQ

What is the immediate containment step?
Freeze routing/prompt changes and apply budgets or throttles to stop runaway spend.

How do we find the cause fast?
Identify top callers, token-heavy prompts, and retry patterns; compare to baseline.

When do we rollback?
If cost signals breach thresholds and verification confirms a regression in routing/prompt behavior.

What evidence should we capture?
Routing versions, prompt versions, top endpoints, cost per task graphs, and actions taken.

What’s the prevention step?
Add budget gates + canary rollout + monitoring alerts for unit cost.