Compare
side by side.

2-4 models · all metrics · per benchmark.

↳ Comparison

01 No models selected

Go back to the Arena and tick 2 to 4 models to compare. More than 4 gets too crowded on one screen.

A The selected models

B Aggregate metrics best = blue · worst = dimmed

C Throughput per benchmark tokens/sec · 9 benches

D Quality breakdown MMLU-Pro · GPQA-Diamond · HumanEval

E The short version

Production AI, on-prem on a DGX Spark, and the notes along the way. Co-founder of Kamoo, based in Hoorn.

Blog

Work

Contact

Built with Astro · hosted on TransIP NL · FR

Search Esc

Compareside by side.