The Arena.
17 models.
1 DGX Spark.
Models are ranked by throughput (tokens/sec) and weighted per use-case. No latency gates: a slow model stays on the list, just near the bottom. A 0.8B model wins its own class, not the main ring.
A Which use-case do you want to optimise for?
B Filter
Search term or maker
Size class
# Model Size Context VRAM Quality MMLU GPQA HE Throughput Score
01 Qwen-3.6 35B-A3B alibaba · MoE · FP8 35B 256K
38 GB
83.9 85.2 86.0 80.4 87
t/s
73.1 02 Qwen-3.6 27B alibaba · Hybrid · FP8 27B 256K
31 GB
86.0 86.2 87.8 83.9 35
t/s
71.9 03 Gemma-4 26B-A4B google · MoE · NVFP4 26B 256K
24 GB
81.5 84.8 79.9 79.8 97
t/s
71.8 04 Qwen-3.6 35B-A3B alibaba · MoE · BF16 35B 256K
70 GB
83.9 85.2 86.0 80.4 44
t/s
70.7 05 Gemma-4 26B-A4B google · MoE · BF16 + MTP 26B 256K
52 GB
80.7 82.6 82.3 77.1 81
t/s
70.1 06 Gemma-4 26B-A4B google · MoE · BF16 26B 256K
52 GB
80.7 82.6 82.3 77.1 59
t/s
68.9 07 Nemotron-3-Super 120B-A12B nvidia · MoE · NVFP4 120B 256K
60 GB
81.4 83.7 79.2 81.2 35
t/s
68.2 08 Gemma-4 31B google · Dense · BF16 31B 256K
62 GB
80.7 82.6 82.3 77.1 12
t/s
66.4 09 Nemotron-3-Nano 30B-A3B nvidia · MoE · NVFP4 30B 256K
21 GB
70.9 77.3 72.2 63.2 126
t/s
64.9 10 Nemotron-3-Nano 30B-A3B nvidia · MoE · FP8 30B 256K
33 GB
70.9 77.3 72.2 63.2 87
t/s
62.5
C Quality vs Throughput, who's on the Pareto frontier? bubble size = VRAM
Models on the Pareto frontier (connected line) are dominant, nothing is both faster and smarter at once. Everything below is dominated, there's another model beating it on both axes. Hover for details.
On the Pareto frontier Dominated VRAM (small → large)
D How the Score is built up Aggregaat
70 %
Quality
MMLU · GPQA · HumanEval
20 %
Throughput
tokens/sec, % cohort-max
10 %
Efficiency
throughput ÷ VRAM, % cohort-max
score = 70%·Quality + 20%·Throughput + 10%·Efficiency
0 / 4 selected Compare →