Models Agents Evals VisualizeIndustry

AgMoDB by @mistakeknot

Model picks

Current defaults by use case.

Product

Production assistants and internal tools.

Default

Claude Sonnet 4.6 (Non-reasoning, High Effort)

Anthropic

AgMoBench 84.7$6.00/M47 tok/s

Reliable product default.

Value

GPT-5.4 mini (xhigh)

OpenAI

AgMoBench 52.1$1.69/M152 tok/s

Lower-cost product lane.

Ceiling

GPT-5.5 (xhigh)

OpenAI

AgMoBench 64.2$11.25/M63 tok/s

Higher ceiling, higher spend.

Browse all models Compare picks

Human frontier

1GPT-5.6 Sol (xhigh)OpenAIHuman Frontier 95.6$11.25/M71 tok/s 2Anthropic: Claude Opus 4.7AnthropicHuman Frontier 95.5$10.00/M—3Claude Opus 4.6 (Non-reasoning, High Effort)AnthropicHuman Frontier 95.2$10.00/M42 tok/s 4Grok 4.5 (high)SpaceXAIHuman Frontier 94.9$3.00/M111 tok/s 5GLM-5.2 (max)Z AIHuman Frontier 94.4$2.15/M183 tok/s 6Claude Sonnet 4.6 (Non-reasoning, High Effort)AnthropicHuman Frontier 93.4$6.00/M47 tok/s

Worth discovering

Kimi K2.6

Kimi

Strong frontier/value ratio.

Cheap reasoning

DeepSeek V4 Flash (Reasoning, Max Effort)

DeepSeek

Aggressive reasoning price/performance.

Fast batch work

Gemini 3.1 Flash-Lite

Google

Fast, cheap high-throughput lane.

Qwen3.6 35B A3B (Reasoning)

Alibaba

Open-ish frontier compression.