Cost Spike Runbook
Steps to diagnose and reduce LLM cost spikes quickly and safely.
Published:
Admin User
Updated:
published
Cost Spike Runbook
- Freeze changes and routing updates
- Identify top callers and token drivers
- Apply budgets/rate limits/caching
- Verify cost per task returns to baseline
- Capture evidence and update controls
Related
Cost Spike Control (LLMOps) Cost per Task AI Rollback RunbookFAQ
What is the immediate containment step?
Freeze routing/prompt changes and apply budgets or throttles to stop runaway spend.
How do we find the cause fast?
Identify top callers, token-heavy prompts, and retry patterns; compare to baseline.
When do we rollback?
If cost signals breach thresholds and verification confirms a regression in routing/prompt behavior.
What evidence should we capture?
Routing versions, prompt versions, top endpoints, cost per task graphs, and actions taken.
What’s the prevention step?
Add budget gates + canary rollout + monitoring alerts for unit cost.