Major upgrade hitting 48.4% on Humanity's Last Exam and 84.6% on ARC-AGI-2. Built for messy research problems without clear guardrails. Already catching subtle logical flaws in peer-reviewed mathematics papers that human reviewers missed. Google AI Ultra subscribers get access today.