Standalone thread architecture separates engine loop from frontend processing. CPU overhead drops to 3% of batch decoding time. Users get 50-100+ tokens per second while maintaining request concurrency. Threading work nobody celebrates but everyone needs when agents have to feel responsive.