Qwen: Qwen3 235B A22B

Last synced May 28, 2026, 9:09 AM131K context

Human Frontier

80.8

Human-calibrated frontier signal, backed by Arena-style preference evidence and separate from raw AgMoBench benchmark composite scores.

Blended Price

$0.39/M

Input Price

$0.22/M

Output Price

$0.88/M

Speed

—

TTFT

—

Benchmark Scores

Intelligence Index

External Benchmarks

ARC-AGI-1

How Qwen: Qwen3 235B A22B Compares

Axes

X Axis

Y Axis

Bubble Size

Filters

Blended Price (USD)$0.00 – $37.5

AgMoBench Overall≥ 2.1

Providers

ai2ai21-labsalibabaanthropicarceeawsazurebaidu

Show quadrants

mistral

meta

openai

google

anthropic

azure

nvidia

alibaba

aws

ibm

nous-research

kimi

xai

zai

deepcogito

kwaikat

xiaomi

cohere

ai21-labs

inclusionai

minimax

baidu

deepseek

prime-intellect

liquidai

stepfun

bytedance_seed

upstage

reka-ai

inception

tencent

arcee

ai2

swiss-ai-initiative

perplexity

Bubble size = Context Window

8191.00

2000000.00

Compare with other models

GPQA Diamond

— / 100

LiveCodeBench

— / 100

Terminal-Bench Hard

— / 100

τ²-Bench

— / 100

ARC-AGI-2benchmark_matrix

1.3 / 100

AA Long Context Reasoning (Matrix)benchmark_matrix

67.0

AIME 2024benchmark_matrix

85.7

AIME 2025 (Matrix)benchmark_matrix

81.5

Arena-Hard Autobenchmark_matrix

95.6

Chatbot Arena ELO (Matrix)benchmark_matrix

1410.0

Codeforces Ratingbenchmark_matrix

2056.0

GPQA Diamond (Matrix)benchmark_matrix

71.1

GSM8Kbenchmark_matrix

94.4

HLE (Matrix)benchmark_matrix

15.4

HMMT Feb 2025benchmark_matrix

62.5

HumanEvalbenchmark_matrix

90.0

IFEvalbenchmark_matrix

87.8

LiveCodeBench (Matrix)benchmark_matrix

70.7

MATH-500 (Matrix)benchmark_matrix

98.2

MathArena Apex 2025benchmark_matrix

5.2

MMLUbenchmark_matrix

87.8

MMLU-Pro (Matrix)benchmark_matrix

79.8

Terminal-Bench 1.0benchmark_matrix

6.6

simpleqabenchmark_matrix

13.2

SWE-bench Verifiedbenchmark_matrix

69.6 / 100

AA-Omniscience AccuracyPredicted

41.5 / 100

AA-Omniscience Hallucination RatePredicted

95.3 / 100

Aider PolyglotPredicted

55.9 / 100

AIMEPredicted

0.9 / 30

AIME 2025Predicted

0.8 / 30

AlpacaEval 2.0 LCPredicted

73.2 / 100

AlpacaEval 2.0 RawPredicted

68.8 / 100

ARC-AGI-1 Cost per TaskPredicted

0.0

ARC-AGI-2 Cost per TaskPredicted

0.0

BFCL (Berkeley Function Calling)Predicted

50.8

BigCodeBench CompletePredicted

63.3 / 100

BigCodeBench InstructPredicted

53.0 / 100

AA Intelligence Index (Matrix)Predicted

70.9

BrowseCompPredicted

44.9

BRUMO 2025Predicted

72.3

CMIMC 2025Predicted

66.6

HMMT Nov 2025Predicted

87.7

IFBench (Matrix)Predicted

44.5

IMO 2025Predicted

11.2

MMMU-ProPredicted

82.4

MRCR v2Predicted

87.8

OSWorldPredicted

30.6

SimpleQAPredicted

17.6

SMT 2025Predicted

81.5

SWE-bench ProPredicted

32.6

Tau-Bench Telecom (Matrix)Predicted

90.6

Terminal-Bench 2.0Predicted

13.4

USAMO 2025Predicted

27.5

Video-MMUPredicted

88.3

browsecompPredicted

42.8

Aider PolyglotPredicted

0.0

Apex AgentsPredicted

2.4

Arc Agi 2Predicted

0.0

BIG-Bench HardPredicted

3.0

CAD-EvalPredicted

3.9

Chess PuzzlesPredicted

0.1

CyBenchPredicted

0.2

DeepResearchBenchPredicted

0.3

FictionLiveBenchPredicted

0.5

GeoBenchPredicted

0.0

GSM8K (Epoch)Predicted

16.3

HellaSwagPredicted

2.4

METR Time HorizonsPredicted

0.5

OTIS Mock AIME 2024–2025Predicted

0.5

PosttrainbenchPredicted

11.7

SimpleQA Verified (Epoch)Predicted

0.2

The Agent CompanyPredicted

2.2

TriviaQAPredicted

3.7

WinoGrandePredicted

0.9

FrontierMathPredicted

9.6 / 100

GAIA Level 1Predicted

74.7

GAIA Level 2Predicted

67.8

GAIA Level 3Predicted

59.9

GAIAPredicted

59.3 / 100

GPQA DiamondPredicted

0.7 / 100

HLEPredicted

0.1 / 100

IFBenchPredicted

0.5 / 100

LCRPredicted

0.2 / 100

LegalBenchPredicted

93.5 / 100

LiveBench CodingPredicted

58.6 / 100

LiveBench Data AnalysisPredicted

37.7 / 100

LiveBench LanguagePredicted

46.1 / 100

LiveBench MathPredicted

54.2 / 100

LiveBench OverallPredicted

40.1 / 100

LiveBench ReasoningPredicted

33.0 / 100

LiveCodeBenchPredicted

0.7 / 100

LongBench v2 EasyPredicted

51.8

LongBench v2 HardPredicted

48.7

LongBench v2Predicted

45.8 / 100

MATH-500Predicted

1.0 / 100

MathVistaPredicted

61.3 / 100

MedQA (USMLE)Predicted

90.8

MLE-benchPredicted

24.5 / 100

MMLU ProPredicted

0.8 / 100

MMMUPredicted

73.7 / 100

MMTU Table UnderstandingPredicted

58.7 / 100

MT-BenchPredicted

8.0 / 10

NoLiMa (NIAH)Predicted

94.5 / 100

OCRBench v2Predicted

84.9 / 100

Open LLM AveragePredicted

36.9 / 100

Open LLM: BBHPredicted

65.7 / 100

Open LLM: GPQAPredicted

34.9 / 100

Open LLM: IFEvalPredicted

61.2 / 100

Open LLM: MATH Level 5Predicted

34.5 / 100

Open LLM: MMLU-PROPredicted

51.1 / 100

Open LLM: MUSRPredicted

45.3 / 100

RE-BenchPredicted

0.3

SciCodePredicted

0.5 / 100

SWE-bench LitePredicted

39.0 / 100

τ²-BenchPredicted

0.4 / 100

tau-bench RetailPredicted

86.2 / 100

Terminal-Bench HardPredicted

0.3 / 100

WebArenaPredicted

19.0 / 100

WildBenchPredicted

56.5

BullshitBenchbullshitbench

6.0 / 100

Arena ELO: Creative Writingchatbot_arena

1324.0

Chatbot Arena ELOchatbot_arena

1375.0

Epoch Capabilities Indexepoch_ai

139.7

Lech Mazur Writingepoch_ai

8.3

EQ-Bench 3eqbench

1541.0

HuggingFace Downloads (30d)hf-downloads

436332.0

HuggingFace Likeshf-downloads

1093.0

SimpleBenchsimplebench

31.0 / 100

Vectara Factual Consistencyvectara_hallucination

90.7 / 100

Vectara Hallucination Ratevectara_hallucination

9.3 / 100

WeirdMLweirdml

37.3 / 100