AI infrastructure doesn't fail with alarms and error messages. Quality erodes. Responses drift. Confidence scores slip. Nothing technically breaks, so nothing triggers alerts. By the time you notice, the damage has compounded across teams and decisions. What monitoring catches problems that announce themselves through gradual wrongness?