The 13% Problem

The infrastructure team's quarterly review: someone pulls up the cluster utilization dashboard. 13% CPU, 31% memory. Someone asks why they're paying for capacity nobody uses. The service teams explain their safety margins. Everyone nods. Nothing changes.

The Scale of Overprovisioning

When researchers analyzed over 4,000 Kubernetes clusters in 2024, they found the average running at 13% CPU utilization, with 69% of provisioned resources sitting idle.

This happens everywhere. Not because workloads are light. Every team made the same reasonable decision: request more capacity than you'll probably need.

What you're seeing is how safety thinking scales. Individual margins compound into infrastructure provisioned for simultaneous worst-case scenarios that probability suggests won't happen.

When Safety Margins Compound

A service typically uses 2 CPU cores under load. The team requests 4 cores: 100% safety margin against traffic spikes, deployment issues, unexpected behavior. Another team does the same. Then another.

Each decision is defensible. But resource requests in Kubernetes determine node allocation. When you request 4 cores, the scheduler reserves that capacity whether you use it or not. Ten services with 100% margins don't create 100% overhead. They create infrastructure sized for every service hitting peak load simultaneously.

The compounding happens silently. No single team sees the aggregate effect. Each sees their service running smoothly with comfortable headroom. The cluster-level view, where 87% of provisioned capacity sits unused, only becomes visible to whoever manages the infrastructure bill.

Web automation infrastructure follows the same logic. A team monitoring competitor prices requests capacity for 2,000 concurrent browser sessions: safety margin for rate limits, retries, authentication failures. Another team does the same for inventory checks. Another for compliance verification. Each margin is defensible. The aggregate infrastructure is provisioned for every team hitting worst-case simultaneously.

Why Visibility Doesn't Fix It

Most organizations can see exactly how much CPU and memory their workloads consume. Monitoring dashboards display the gap between requested and used resources in real time. Yet 73% of teams review resource requests less than once per quarter.

Visibility isn't the problem. Adjusting resource requests downward requires someone to take responsibility for the change, test it, monitor it, and accept the risk that they've cut too close. The organizational incentive structure makes this decision asymmetric.

The discussion in pull request comments:

“

"Should we reduce this to 2 cores?" "Probably fine, but remember the incident last quarter?" "Right. Leave it at 4."

The original estimate becomes permanent not through active decision but through risk aversion that compounds across every service.

Teams measured on uptime and performance, not infrastructure efficiency, rationally choose to leave safety margins untouched. Resource requests get defined during initial deployment: an educated guess plus safety margin. They rarely get revisited. Applications evolve, traffic patterns change, but those initial estimates become infrastructure commitments that persist for years.

The Coordination Problem

A cluster running at 13% utilization shows you the cumulative result of hundreds of teams making individually rational choices. Each team optimized for their local context. The waste emerged from the interaction of many reasonable decisions, not from any single team's failure.

Overprovisioning persists despite visibility tools and cost pressure because right-sizing requires someone to accept downward risk in exchange for collective benefit. In organizations where reliability is paramount and infrastructure cost is abstracted, that trade-off doesn't make sense at the team level.

The operational phenomenon extends across any infrastructure where teams provision capacity independently: Kubernetes clusters, browser automation fleets, compute resources. Individual margins compound. The organizational structures that make sense for reliability create exactly the incentives that prevent optimization.

The platforms that eventually solve this won't just provide better monitoring. They'll change who owns the utilization question.

Things to follow up on...

Vertical Pod Autoscaler adoption: Despite Kubernetes offering automated resource adjustment capabilities, adoption remains low due to stability concerns and the need for careful configuration.
Stateful workload patterns: Databases and other stateful applications tend to be more heavily overprovisioned than stateless services, with teams adding larger safety margins due to the higher cost of performance issues.
Cost optimization survey findings: StormForge's 2024 report found that 68% of organizations identified overprovisioning as their top cost challenge, with workloads typically requesting 2-4x more resources than needed.
Production versus development patterns: Production environments show higher overprovisioning rates than development environments, as teams apply larger safety margins to workloads serving real users.