AgMoDB
ModelsAgentsEvalsIndustry
AgMoDB by @mistakeknot

Benchmark Heatmap

Compare models across all benchmark scores at a glance. Colors are normalized per column (red = low, green = high).

Top
30
Sort by
ModelAgMoBenchAgMo TrustAgMo Pred.ReasoningCodingMathAgenticRobust.AA Intel.AA CodingAA MathMMLU ProGPQAHLELiveCodeSciCodeMATH-500AIMEAIME 25IFBenchLCRTB Hardτ²-BenchGDP-Val AAHMMT Nov 2…IFEvalLiveCodeBe…SimpleQA V…BALROGBIG-Bench …BoolQCAD-EvalChatbot Ar…CyBenchDeepResear…FictionLiv…GeoBenchSWE-Bench …GDPvalOpen LLM: …SWE-Bench …SEAL Tool …Epoch Capa…HleLongBench …RE-BenchGAIAGSOMETR Time …VPCTWebDev ArenaLiveBench …MMLUSimpleQALAMBADAtau-bench …MT-Benchtau-bench …Aider Poly…TruthfulQAWebArenaChess Puzz…Parameter …OTIS Mock …NoLiMa (NI…Training C…OSWorld (E…BullshitBe…SWE-bench …AA-Omnisci…Open LLM: …ARC-AGI-2Open LLM: …Open LLM: …MMMUVideo-MMEBrowseCompArena ELO:…FORTRESSPropensity…MASKPRBench Fi…PRBench Le…MCP AtlasMultiChall…EnigmaEvalIFBench (M…Open LLM: …Open LLM: …AA-Omnisci…OmniDocBen…IDP CoreIDP OmniDocOCRBench v2ARC-AGI-1MMTU Table…IDP OverallIDP OlmOCRMedQA (USM…Video-MME …Video-MME …PIQABFCL (Berk…TriviaQALiveBench …WinoGrandeMLE-benchWeirdMLHuggingFac…EQ-Bench 3AA-Omnisci…BRUMO 2025HuggingFac…Vectara Fa…Aider Poly…WildBenchVectara Ha…Chatbot Ar…Codeforces…CritPtGSM8KHMMT Feb 2…HumanEvalMathVisionMRCR v2OSWorldSWE-bench …Tau-Bench …USAMO 2025Video-MMUAA Intelli…AA Long Co…SimpleBenchMathVistaAIME 2024BigCodeBen…BigCodeBen…AIME 2025 …AIME 2026Arena-Hard…GPQA Diamo…OpenRouter…LiveBench …LiveBench …ARC-AGI-1 …ARC-AGI-2 …LiveBench …LongBench v2LegalBenchFrontierMathAlpacaEval…AlpacaEval…LiveBench …HLE (Matrix)IMO 2025MATH-500 (…MathArena …SciCode (M…OpenRouter…Terminal-B…BigCodeBenchCMIMC 2025MMLU-Pro (…MMMU-ProSMT 2025Terminal-B…GSM8K (Epo…HellaSwagLech Mazur…The Agent …Video-MME …GAIA Level 1GAIA Level 2GAIA Level 3Apex AgentsArc Agi 2LongBench …GdpvalPosttrainb…SWE-bench …Open LLM A…AIME 2025MMLU Proτ²-BenchBrowsecompSimpleqaLCRAIMETerminal-B…IFBenchLiveCodeBe…MATH-500GPQA DiamondHLESciCode
Gemini 3.1 Pro Preview77.690.091.396.987.095.599.189.357.255.5——0.90.40.90.6—0.3—0.80.70.51.0—93.389.282.00.80.03.00.62.31493.01.10.50.70.046.1—53.9——156.70.557.0100.070.20.757.30.81446.579.992.672.1——7.890.50.2—4.30.6—1.093.3——37.080.632.924.677.149.227.487.5—85.91455.0—————73.971.419.873.439.833.396.8———86.098.073.8——96.4——0.856.927.684.01.667.172.1——55.399.8—89.685.634.510.41490.02700.017.784.397.995.069.884.972.054.299.069.594.057.076.879.661.298.037.429.2100.097.074.594.3293000000000.091.085.40.51.078.525.387.440.014.39.576.544.446.998.533.558.944.068.5—38.489.580.595.556.01.80.08.856.4—83.867.087.93.40.828.80.60.065.718.30.90.91.085.972.10.70.3—0.80.90.90.90.40.6
Claude Opus 4.6 (Non-reasoning, High Effort)75.084.587.984.697.792.576.795.446.547.6——0.80.20.70.5—0.1—0.40.60.50.81606.037.794.076.00.40.03.00.760.71500.00.90.50.90.051.9—49.347.1—155.00.259.1100.051.60.369.90.41542.778.290.872.0——7.991.90.2—7.10.2—0.992.0——87.075.613.526.368.846.727.476.5—84.01549.013.0—96.353.352.375.8——71.337.929.197.8———77.893.070.1——95.4——0.850.510.977.21.263.165.9——45.199.7—87.881.634.912.21502.02650.012.685.615.995.0—93.072.755.698.229.985.653.076.867.655.275.537.329.2100.0—59.791.31020000000000.091.782.41.93.657.636.985.340.027.317.075.840.043.693.010.3—1.062.9—50.982.077.394.339.02.40.08.62.5—69.039.926.43.40.740.30.70.049.515.10.80.80.884.072.00.60.1—0.40.70.80.80.20.5
Claude Sonnet 4.6 (Non-reasoning, High Effort)74.084.586.278.695.787.789.993.044.446.4——0.80.10.70.5—0.1—0.40.60.50.81633.093.592.074.00.40.03.00.72.9—0.80.50.90.0——60.3——152.90.349.7100.046.00.236.70.61510.671.590.063.8——7.889.00.1—11.70.3—0.691.9——91.079.612.428.060.447.227.774.2—78.61523.0—————69.4——56.539.529.397.5———85.786.569.3——92.1——0.952.815.976.01.162.766.1——40.099.5—89.479.136.510.6—2010.010.486.292.893.0—82.072.548.297.024.883.451.075.459.460.397.039.030.797.0—60.274.11070000000000.091.478.01.22.777.323.066.135.031.822.776.338.041.096.53.9—6.059.1—93.280.074.593.241.82.50.08.52.3—63.633.713.53.41.222.50.50.043.215.40.70.80.878.068.00.60.1—0.40.70.80.80.10.5
Claude Sonnet 4.6 (Non-reasoning, Low Effort)73.383.585.778.592.587.789.993.042.643.0——0.80.10.70.4—0.1—0.40.60.40.81633.093.392.074.00.30.03.00.72.9—0.80.50.90.0——60.4——153.00.249.8100.046.30.236.00.61510.670.690.068.0——7.889.00.1—11.70.1—0.992.0——91.079.612.428.460.447.527.874.2—78.01521.0————————56.339.529.697.5———85.886.564.4—————0.952.815.374.91.161.255.1——40.099.4—89.478.837.010.6—2010.09.286.292.493.0—82.072.548.297.024.183.451.075.358.560.396.939.831.397.0—60.474.1972000000000.091.277.31.22.777.023.167.435.041.932.676.038.040.496.53.7—22.059.1—92.880.074.592.940.82.50.08.52.3—63.834.514.03.30.622.60.50.042.815.60.70.80.878.068.00.60.1—0.40.70.80.80.10.4
Claude Opus 4.5 (Reasoning)72.086.387.786.293.972.176.496.349.747.891.30.90.90.30.90.5—0.40.90.60.70.50.9—95.690.068.00.40.03.00.72.71474.00.80.50.90.0——27.9——149.70.167.497.862.60.349.40.44468.076.090.872.0——8.288.90.1—12.60.1—0.893.7—66.390.074.4—24.237.639.627.380.7—67.81489.09.6—92.5—————67.938.421.696.3———82.580.069.5—————0.877.514.678.11.256.663.7——40.799.0—89.181.840.910.91468.02070.00.387.192.995.1—88.866.345.998.225.768.462.775.062.061.590.053.443.192.8—59.787.0157000000000.080.674.11.52.462.741.184.621.069.967.875.437.638.785.01.9—14.057.8—96.580.080.991.939.02.20.08.656.9—77.960.968.13.10.145.10.60.052.410.6——0.967.872.00.70.4—0.60.90.90.90.30.5
Grok 471.985.085.378.190.785.490.688.241.540.592.70.90.90.20.80.51.00.90.90.50.70.40.7—93.391.082.00.50.03.00.72.61491.00.40.51.00.0——8.9——147.40.260.172.530.90.00.70.6—39.794.048.0——8.685.659.6—13.60.33000000000000.00.891.95.0000000000001e+26—56.079.015.340.265.150.929.175.0—62.3—————————60.040.832.695.1———87.689.564.6——92.5——0.863.09.225.61.137.645.7—1496.041.497.5—88.879.646.211.21465.02650.01.196.890.095.0—73.952.046.596.861.987.673.068.060.572.494.056.145.894.0—89.791.0—45.542.00.30.943.547.090.938.045.034.658.544.411.998.55.2——23.1—84.485.381.084.639.01.70.38.11.2—34.947.938.42.70.251.70.20.071.517.6——0.758.755.00.70.9—0.50.8—0.90.20.5
Gemini 3 Pro Preview (high)71.186.887.291.781.871.889.389.348.446.595.70.90.90.40.90.6—0.41.00.70.70.40.9—93.390.079.70.70.03.00.62.61486.00.80.51.00.0——36.0——153.40.457.792.936.50.254.10.99824.073.491.872.1——8.288.50.2—12.20.3—0.993.9——48.069.6—27.354.049.627.887.5—85.91438.0————————69.740.032.796.590.2——83.075.070.5——96.0——0.872.521.377.41.364.471.2——55.998.3—86.484.838.313.61501.02512.09.187.597.593.0—77.055.043.399.335.187.666.676.476.461.797.060.861.995.0—90.591.9141000000000.081.884.60.530.674.441.987.019.016.010.774.637.543.097.323.456.120.056.9—90.089.881.093.446.41.90.08.832.8—48.134.538.53.00.345.90.50.063.117.6——0.985.972.10.70.4—0.70.90.90.90.40.6
GPT-5.2 (xhigh)70.488.186.483.992.497.173.174.551.348.799.00.90.90.40.90.5—0.91.00.80.70.50.8—95.895.080.00.50.03.00.62.11482.00.80.51.00.0——42.623.8—153.70.362.481.733.50.138.30.61641.048.988.058.0——8.388.00.1—4.80.4—0.995.3——38.072.8-1.027.772.953.128.686.7—77.91472.0—34.4——————69.541.036.395.8———87.194.573.5——94.1——0.855.916.742.81.356.772.2——43.898.3—89.285.444.510.81440.02800.011.697.599.495.0—70.038.255.698.732.290.570.075.745.862.9100.057.549.1100.0—91.393.2165000000000.058.350.011.439.047.746.887.040.331.622.276.535.238.799.413.554.69.054.0—91.386.786.592.047.22.10.08.71.7—36.550.039.63.50.051.40.40.068.320.1——0.877.958.00.70.9—0.80.91.00.90.40.5
Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort)69.780.077.891.594.687.789.952.351.750.9——0.90.30.70.5—0.2—0.60.70.50.81633.092.187.648.50.30.03.00.73.1—0.70.50.80.0——62.4——153.00.249.498.643.60.031.10.51510.662.490.041.0——7.889.00.1—11.90.1—0.992.5——30.079.612.432.160.450.428.674.4—75.91521.0————————42.340.332.296.9———85.186.561.2—————0.952.19.863.71.145.955.0—1533.040.091.4—89.474.741.810.6—2010.02.796.215.988.0—82.042.048.297.07.085.951.074.747.960.743.445.436.297.0—51.674.11040000000000.088.569.51.22.773.837.975.535.070.768.372.65.835.396.52.0—7.059.1—88.280.074.589.636.43.30.08.42.1—60.943.710.32.90.641.70.50.036.017.70.60.80.878.015.90.70.2—0.60.70.90.90.30.5
Gemini 2.5 Pro69.675.773.576.589.355.861.057.634.631.987.70.90.80.20.80.41.00.90.90.50.70.30.5—66.790.870.40.60.03.00.71.4—0.30.30.60.0——6.7—68.8146.70.263.339.955.10.023.60.43505.058.389.852.9——7.980.60.0—14.70.2—0.894.7——20.053.6—46.14.956.127.584.0—83.2—————————52.337.340.095.688.0——86.237.066.5——93.1——0.855.47.470.80.959.654.0—1546.039.090.0—93.083.150.57.01437.02001.02.096.082.594.1—83.139.838.596.524.084.870.072.251.670.292.064.562.986.7—90.884.0—68.375.50.50.851.645.684.310.065.926.175.718.831.697.30.5——32.6—58.185.682.184.925.32.80.18.64.2—71.750.038.51.90.050.30.30.051.725.4——0.557.152.90.70.9—0.50.8—0.80.20.4
Claude Opus 4.5 (Non-reasoning)68.981.182.878.589.761.381.896.343.142.962.70.90.80.10.70.5—0.30.60.40.70.40.9—95.690.068.00.40.03.00.72.81474.00.80.50.90.045.9—28.123.4—149.90.166.697.856.50.349.40.44468.076.090.872.0——8.288.90.1—12.60.1—0.893.2—66.390.079.2—25.137.639.827.380.7—67.81489.013.6———————67.138.421.796.3———81.280.069.0——93.2——0.877.513.978.01.255.963.7——40.799.0—89.180.340.310.91468.02070.00.386.892.995.1—88.866.345.998.223.968.462.075.062.061.190.052.942.892.8—59.787.0161000000000.080.674.11.52.462.639.784.621.069.867.775.437.638.585.02.0—29.057.8—96.580.080.691.839.02.40.08.656.9—73.051.544.93.10.143.50.60.047.910.7——0.967.872.00.70.3—0.40.70.90.80.10.5
Claude Opus 4.6 (Adaptive Reasoning, Max Effort)68.779.376.592.297.667.186.960.353.048.1——0.90.40.80.5—0.4—0.50.70.50.91606.038.889.747.40.40.03.00.660.71502.00.90.50.90.0——11.4——143.70.260.5100.060.80.369.90.41542.775.090.842.9——7.991.90.1—0.70.2—0.993.6——34.075.613.535.568.836.328.676.5—80.01548.0————————65.937.819.197.4———82.393.073.2—————0.853.413.973.11.263.178.0——45.198.9—87.881.141.212.21502.01886.00.396.915.995.0—93.038.235.898.224.586.253.075.067.657.975.545.336.0100.0—59.791.31020000000000.089.279.31.93.654.440.885.310.027.317.074.37.138.294.41.8—11.043.2—49.486.677.391.639.05.00.08.62.7—76.558.661.93.40.744.80.60.050.69.40.90.90.984.022.80.70.4—0.50.80.90.90.40.5
GPT-5.2 (medium)67.585.283.379.391.296.767.369.346.644.296.70.90.90.20.90.5—0.81.00.70.60.40.7—95.895.080.00.50.03.00.72.21482.00.70.51.00.0——42.7——153.80.360.981.732.30.138.30.61641.048.988.058.0——8.388.00.1—4.70.4—0.995.1——27.065.0-1.029.472.953.228.686.7—77.91472.0————————67.841.036.595.8———86.494.570.1—————0.855.915.942.81.255.860.7——43.898.3—89.284.844.010.81440.02800.011.697.299.495.0—70.038.255.698.730.490.570.075.745.862.6100.057.348.9100.0—89.893.2193000000000.058.350.011.439.047.745.786.540.331.622.176.535.238.499.413.554.616.054.0—91.386.786.592.046.72.40.08.71.6—35.848.539.13.50.050.20.40.065.020.3——0.777.958.00.60.8—0.70.91.00.90.20.5
GPT-5.1 (high)67.485.784.390.295.485.160.964.247.744.794.00.90.90.30.90.4—0.70.90.70.80.50.8—91.792.582.00.50.03.00.62.5—0.70.50.90.0——10.1——149.70.358.696.049.50.016.30.6—72.090.055.4——7.988.22.0—12.60.2—1.491.6——25.076.3—28.617.651.528.485.4—64.7————48.049.3———62.240.634.196.3———86.772.869.5——96.4——0.858.514.978.81.245.260.8——40.693.3—87.983.643.112.11464.0—6.289.889.794.4—77.152.045.098.030.388.564.475.053.272.799.148.138.494.0—87.188.1—86.979.30.71.269.628.685.731.050.041.372.532.038.199.01.0——47.6—91.987.585.491.041.62.30.08.633.7—64.648.445.32.70.028.80.40.074.418.5——0.860.055.00.80.7—0.70.91.00.90.30.4
Kimi K2.5 (Reasoning)64.680.081.188.076.883.864.570.746.839.5——0.90.30.80.5—0.4—0.70.70.31.0—89.290.085.00.50.03.00.62.7—0.60.50.90.0——2.7——148.20.255.898.034.00.040.80.6—69.192.045.0——7.888.00.2—0.70.3—1.090.0—63.352.070.8-8.122.511.849.727.784.0—74.91442.0————————67.439.830.096.389.3——86.365.356.6——94.4——0.855.231.976.01.460.145.6745208.0—43.498.334.085.882.341.314.2—2350.05.695.995.495.084.277.862.047.296.226.486.663.075.246.870.696.145.636.296.0—89.087.6636000000000.084.977.70.10.361.441.783.628.036.426.777.950.238.798.08.8—28.043.2—91.387.178.590.639.92.40.08.623.9—46.331.832.82.50.140.50.40.079.514.60.90.91.074.945.00.70.4—0.70.80.90.90.30.5
Grok 4.1 Fast (Reasoning)64.378.375.170.387.288.092.652.938.630.989.30.90.90.20.80.4—0.90.90.50.70.20.9—93.391.082.00.30.03.00.72.71464.00.40.40.80.0——7.2———0.259.679.640.30.11.60.53721.033.590.355.0——8.081.40.1—13.80.2—1.291.2——19.079.0-28.732.742.051.429.283.3—61.4—————————58.640.933.194.9———87.642.765.6—————0.869.68.523.40.952.757.2——42.597.5—80.880.550.319.21483.02650.05.084.392.795.0—75.652.039.096.743.987.672.872.556.070.098.456.746.494.0—92.191.0488000000000.038.950.00.20.340.650.587.638.033.125.654.344.416.498.55.2—2.038.6—84.485.381.084.638.63.40.08.41.2—52.741.041.22.60.055.50.30.078.118.0——0.957.655.00.70.9—0.50.81.00.90.20.4
GPT-5 (high)60.477.476.476.390.375.648.662.044.636.094.30.90.90.30.80.41.01.00.90.70.80.30.8—89.290.084.50.40.03.00.71.8—0.50.50.90.0——4.6——150.00.359.495.752.80.041.30.4—60.091.048.0——8.386.40.1—13.00.2—1.394.7——21.075.6—29.00.050.728.874.4—59.2————51.349.0—63.2—61.140.534.595.8———86.76.069.6——96.3——0.854.210.763.11.139.660.7—1586.038.991.7—84.981.347.615.11460.02537.05.788.588.394.0—77.440.041.897.022.475.968.075.656.771.394.635.327.194.6—88.688.4—72.470.00.00.148.045.986.025.255.246.878.635.238.199.41.0——35.2—90.087.082.092.041.30.70.08.63.6—67.953.147.92.60.250.50.30.075.420.3——0.854.955.00.81.0—0.70.8—0.90.30.4
GPT-5 (medium)60.278.275.983.987.374.448.656.442.039.091.70.90.80.20.70.41.00.90.90.70.70.40.9—89.290.084.50.40.03.00.61.7—0.50.50.90.0——18.0——150.00.359.495.753.00.042.60.4—60.091.049.6——8.387.00.1—12.80.2—1.194.6——18.075.6—27.918.349.028.574.4—60.2—————————59.640.330.795.9———86.870.269.1—————0.854.312.463.11.239.660.4—1586.038.991.7—84.981.345.915.11460.02537.05.788.688.394.0—77.740.041.897.024.975.968.075.656.772.094.634.526.394.6—88.588.4—72.470.04.87.148.045.386.025.254.546.378.635.238.199.41.0——35.2—90.087.081.992.041.30.40.08.73.6—68.052.948.02.60.249.90.40.076.216.0——0.954.955.00.70.9—0.70.7—0.80.20.4
GPT-5.2 (Non-reasoning)59.975.674.463.279.780.967.369.333.634.751.00.80.70.10.70.4—0.60.50.50.40.30.5—95.895.080.00.50.03.00.72.51482.00.60.51.00.0——43.0——153.80.357.981.729.20.138.30.61641.048.988.058.0——8.388.00.1—4.70.4—0.994.4——27.065.0-1.032.772.953.428.686.7—75.51472.0————————64.640.936.795.7———84.794.569.3—————0.855.914.542.81.254.560.7——43.898.3—89.283.243.010.81440.02800.011.696.599.495.0—70.038.255.698.727.090.570.075.745.862.1100.056.648.4100.0—86.293.2180000000000.058.350.011.439.047.743.585.340.331.422.176.535.238.099.413.554.69.054.0—91.386.786.592.045.82.70.08.71.4—33.844.536.13.50.047.80.40.058.020.6——0.577.958.00.40.6—0.50.70.90.70.10.4
Kimi K2.5 (Non-reasoning)59.874.076.577.264.383.864.570.737.325.8——0.80.10.70.4—0.3—0.40.60.20.8—89.290.085.00.50.03.00.73.0—0.60.50.90.0——2.7——148.00.253.397.930.80.040.00.6—69.192.045.0——7.888.10.1—0.60.3—0.789.7—63.352.070.8-8.122.511.849.727.784.0—74.91442.0————————66.339.830.096.2———85.665.355.9—————0.852.031.676.01.459.845.6482783.0—43.598.366.085.880.440.314.2—2350.05.595.195.495.084.277.862.047.796.224.086.662.175.246.866.996.144.135.096.0—87.287.6558000000000.084.977.70.10.361.439.882.228.036.326.677.950.238.598.08.8—3.043.2—91.387.178.590.639.23.00.08.623.8—43.726.821.02.50.139.70.40.077.114.60.80.80.874.945.00.60.3—0.40.70.90.80.10.4
GPT-5.3 Codex (xhigh)59.587.387.196.599.292.8—65.354.053.1——0.90.40.80.5—0.3—0.80.70.50.9—96.092.085.00.60.03.00.62.4—0.80.51.00.0——3.7———0.477.979.468.20.254.70.7—72.894.067.7——8.390.40.2—12.30.3—1.388.5——24.056.89.926.882.138.726.984.0—78.5—————————63.438.523.596.5———84.691.775.0—————0.957.020.780.21.462.779.3——51.899.6—86.189.137.813.9——14.499.096.493.0—81.564.756.898.733.886.166.476.767.572.999.341.532.694.0—83.481.0—87.880.11.53.562.741.175.235.321.917.178.220.042.996.08.3——64.7—37.686.780.897.648.52.50.08.737.2—82.266.684.13.30.172.50.60.060.512.30.90.90.978.259.80.70.3—0.80.80.90.90.40.5
DeepSeek V3.2 Speciale59.381.482.489.194.094.375.3—29.437.996.70.90.90.30.90.4—0.31.00.60.60.30.0—93.388.088.70.50.03.00.71.3—0.50.50.90.0——5.0———0.354.697.329.20.116.40.3—67.990.355.7——7.988.30.1—12.10.2—0.888.9——40.776.0—34.037.352.928.482.1—75.0—————————68.540.736.297.5———82.179.765.0—————0.753.714.971.80.956.446.7——43.299.2—88.081.040.412.1—2701.07.689.999.291.5—76.438.040.598.364.584.262.381.559.270.396.026.920.496.0—80.885.7—88.177.80.71.464.340.683.529.224.519.166.030.683.398.09.4——54.4—94.487.580.889.243.03.00.08.42.7—39.726.626.43.00.044.60.50.070.120.4——0.074.352.00.60.3—0.60.91.00.90.30.4
Gemini 3 Flash Preview (Non-reasoning)58.675.773.077.375.956.075.262.335.037.855.70.90.80.10.80.5—0.30.60.60.50.30.4—93.388.290.80.70.03.00.72.71474.00.70.50.90.0——3.7——150.90.355.596.025.70.117.00.73053.072.491.860.0——7.982.00.2—12.80.4—0.993.6——10.075.811.627.133.647.727.987.6—75.01437.0————————85.539.829.896.390.4——85.484.767.8——95.8——0.822.316.574.51.362.765.2——45.5100.0—86.582.739.813.51473.02100.07.788.897.894.8—75.250.034.698.458.486.962.175.661.161.593.060.861.390.4—88.890.4985000000000.084.284.60.20.274.828.886.930.027.919.273.933.740.198.315.6—2.051.7—90.688.681.292.943.92.40.08.731.6—31.725.129.53.30.038.60.40.055.015.5——0.475.060.00.50.3—0.60.80.90.80.10.5
o358.376.573.186.587.060.338.061.038.438.488.30.90.80.20.80.41.00.90.90.70.70.40.8—94.292.175.80.50.03.00.74.0—0.40.50.80.0——1.0——147.20.257.172.321.90.130.00.5—50.293.348.6——8.386.913.8—1.50.2—0.894.4—9.126.058.4—15.16.529.425.882.9—49.7——10.5—————13.169.336.211.697.1———85.060.869.1——96.1——0.863.048.162.90.831.652.4—1665.040.995.8—89.676.945.610.4—2706.01.496.177.588.4—77.148.743.597.621.986.565.369.053.159.796.759.955.988.9—85.987.7—62.068.40.50.850.243.785.510.076.969.176.019.816.799.22.7——40.3—79.485.975.087.730.20.30.08.42.9—36.618.26.12.70.048.00.30.068.64.3——0.849.749.40.70.9—0.70.8—0.80.20.4
Qwen3.5 397B A17B (Reasoning)57.885.382.196.691.896.0—60.845.041.3——0.90.30.80.4—0.5—0.80.70.41.0—92.792.683.60.50.03.00.62.6—0.60.50.90.0——5.9———0.356.999.768.40.117.40.6—67.588.643.6——7.988.00.2—12.30.2—1.191.1——78.076.4-29.832.042.643.928.485.0—73.8—————————76.539.828.097.5———88.181.368.0—————0.956.615.471.01.557.356.8——44.197.9—88.085.741.612.0—2200.06.189.794.292.090.380.154.847.198.227.786.363.375.260.871.794.046.136.695.091.384.788.4—80.674.10.71.662.742.581.128.325.119.674.432.040.298.04.7——51.4—91.487.881.292.041.52.50.08.63.0—82.237.185.12.90.046.60.40.073.315.50.90.91.072.635.00.70.5—0.80.80.90.90.30.4
GPT-5 (low)57.675.273.078.383.169.648.656.439.230.783.00.90.80.20.80.41.00.80.80.70.60.30.8—89.290.084.50.40.03.00.71.8—0.50.50.90.0——18.0——150.00.358.195.751.50.042.50.4—60.091.049.5——8.387.10.1—12.70.2—1.194.5——18.075.6—28.318.348.928.474.4—60.6—————————59.240.230.695.9———86.570.268.8—————0.853.712.363.11.139.660.4—1586.038.991.7—84.981.345.115.11460.02537.05.788.488.394.0—77.740.041.897.024.075.868.075.656.769.894.634.125.994.6—87.888.4—72.470.04.87.148.044.286.025.254.346.278.635.238.199.41.0——35.2—90.087.081.792.041.30.30.08.65.2—67.151.046.22.70.248.70.40.075.116.0——0.854.955.00.60.8—0.70.8—0.80.20.4
GPT-4.155.553.353.051.066.345.544.344.126.321.834.70.80.70.00.50.40.90.40.30.40.60.10.5—92.987.444.70.40.03.00.87.3—0.20.30.50.0——42.5——137.50.155.00.069.70.00.40.4—46.090.242.3——8.974.80.0—18.50.1—0.497.0——14.02.8—63.90.426.124.274.8—72.6—————————43.229.919.995.2—74.779.983.45.551.468.049.491.2——0.854.03.635.10.830.239.0—1519.038.673.3—94.478.253.65.6—1807.00.292.819.492.0—80.032.432.394.89.386.768.968.727.066.046.541.432.246.4—61.566.3—56.548.50.00.143.242.490.110.935.330.173.83.74.892.40.4——23.3—69.181.881.677.930.316.77.98.21.9—86.870.930.81.80.046.90.10.02.714.6——0.550.741.60.60.4—0.40.5—0.70.00.4
Gemini 3 Pro Preview (low)55.486.184.787.994.354.986.4—41.339.486.70.90.90.30.90.5—0.80.90.50.70.30.7—93.393.087.80.70.03.00.72.3—0.80.51.00.0——18.117.9——0.458.3100.072.80.254.10.99824.073.491.363.5——7.989.80.3—0.40.3—0.994.1——48.069.6—23.831.145.027.485.6—76.9——————70.865.718.275.739.327.096.5———83.675.072.7—————0.872.520.377.41.064.469.9——47.199.5—86.490.239.213.6——12.396.896.594.7—81.258.649.898.553.087.171.176.276.470.799.461.362.399.9—86.192.4139000000000.081.884.60.50.874.442.683.819.068.566.574.643.648.498.87.1—17.059.3—91.487.881.293.548.13.20.08.632.9—85.474.292.73.00.346.60.50.071.613.4——0.776.359.80.70.8—0.50.91.00.90.30.5
Qwen3.5 397B A17B (Non-reasoning)55.381.678.886.189.496.0—60.840.137.4——0.90.20.80.4—0.5—0.50.60.40.8—92.292.683.60.50.03.00.72.9—0.50.40.80.0——8.0———0.255.698.764.00.010.60.6—63.988.635.0——7.986.50.1—12.40.2—0.991.1——78.076.4-29.836.917.147.129.385.0—71.4—————————76.540.530.997.2———88.169.566.0—————0.954.512.466.11.550.553.6——43.497.6—89.080.943.211.0—2200.03.689.491.592.090.379.352.145.297.923.286.364.074.555.470.994.047.938.395.091.383.188.4—77.770.80.40.959.342.282.324.926.920.973.032.034.598.03.5——35.7—89.387.881.290.636.73.00.08.42.7—79.035.173.62.20.046.30.40.071.118.00.90.90.869.935.00.60.5—0.50.80.90.90.20.4
Kimi K2 Thinking53.278.779.084.389.891.545.7—40.934.894.70.80.80.20.90.4—0.80.90.70.70.30.9—89.291.582.60.30.03.00.72.8—0.50.50.80.0——2.7——145.50.257.991.548.30.00.60.57336.061.690.335.0——8.683.40.1—2.50.2—0.891.6——43.063.4—22.54.549.727.782.7—60.2—————————58.339.830.095.8———85.112.056.8——92.6——0.856.429.863.50.940.442.8——24.293.3—90.480.847.79.6—2150.04.096.289.492.0—77.344.441.297.28.186.766.773.239.670.894.053.443.394.5—86.985.7—81.166.50.20.452.344.987.024.659.952.967.444.950.697.00.0——35.7—91.984.679.891.036.43.00.08.64.0—63.748.043.61.60.149.40.30.072.714.6——0.960.235.00.70.8—0.70.91.00.80.20.4
Low
High(normalized per column)Predicted