Problems from the 2026 American Invitational Mathematics Examination. The freshest AIME data available, with near-zero training contamination risk.
Cutting-edge math evaluation with no contamination. Only top-tier frontier models have scores.
Very few models evaluated so far (3 scores). Limited statistical power.
Higher is better. Percentage of problems solved correctly (0–100%).
Very few models evaluated so far (3 scores).