Purpose-Built Models Measure Agent Quality