OpenAI, Anthropic, Google offer KV caching that stores intermediate computation states. Inference time drops up to 50% for certain workloads. Claude provides steep discounts for static prompts. Run the numbers across millions of requests. The savings are real.