Red Hat, Google Cloud, IBM Research, NVIDIA, and CoreWeave launched llm-d in May 2025. A distributed serving stack built on vLLM. Kubernetes-native orchestration with disaggregated prefill/decode and system-wide KV-cache routing enables deployment patterns previously impossible.
llm-d Brings Kubernetes-Native Orchestration
November 27, 2025