Offline evaluation uses curated datasets in controlled environments. Online evaluation monitors real user interactions. The gap between them reveals model drift, unexpected queries, and edge cases your test data missed. Testing only offline means optimizing for conditions that don't exist in production.