GPU instances bill whether busy or idle. Kubernetes adapted: Kueue for workload management, Dynamic Resource Allocation for GPUs, topology-aware scheduling. Model sizes now exceed single GPU capacity. Work splits across multiple accelerators or doesn't run.