Berkeley's PagedAttention applies virtual memory concepts to attention algorithms. Twenty-four times faster than HuggingFace Transformers. Not a new model architecture. Just computer science fundamentals applied to serving at scale. Unglamorous and essential.