What does the
DGX Spark cost?

A DGX Spark doesn't sit there for free. Hardware amortisation plus power under load, together, spread across the month. Move the assumptions and see where you land.

Hardware / month €0 amortisation of the Spark
Power / month €0 170W under load
Total / month €0 hardware plus power

The formula

cost/month = (hardware ÷ amortisation) + (W ÷ 1000 × hours × kWh price)

Two line items, that's it. Hardware amortised linearly over the chosen period, power only during the on-load hours you give. We don't count idle hours.

Assumptions

  • Power assumes 170W under vLLM load. Max TDP sits around 240W, in practice it runs lower.
  • Hardware is the Founders Edition NL excl. VAT. Adjust if you bought it second-hand, get a tax benefit or sourced it another way.
  • Utilisation assumption: power costs are only counted during the hours you give. 24/7 at 100% load usually isn't realistic.
  • Not included: internet, cooling, space, your own time to manage it. This is a floor, not a full business case.
What this number doesn't tell you: whether local is cheaper than a cloud API. For that you also need tokens-per-second, and those differ per model. We work that out below, with the arena throughput.

What a token really costs

The Spark costs the same per month whether you run it flat out or not (see the calculator above, at 8 hours per workday). So the price per token depends on how many you push through it. Below, the € per 1M output tokens for an office of 10 to 25 people, next to what the same tokens cost in the cloud.

The honest conclusion: on-prem doesn't just win on price. Mistral Small on your own Spark costs more per token than Mistral's own EU API, literally the same model. Against GPT-5 mini you do win. But the real reason for local isn't in this table: your data stays in-house and you're not subject to the CLOUD Act. Pick on-prem for the jurisdiction, not for a few cents.
Model Precision €/1M, 10 users €/1M, 25 users (peak)
Mistral Small EU €0,50 any volume
GPT-5 mini US, CLOUD Act €1,76 any volume
Qwen-3.5 0.8B BF16 €0,18 €0,18
Qwen-3.5 2B BF16 €0,35 €0,22
Ministral-3 3B BF16 €0,33 €0,34
Nemotron-3-Nano 4B BF16 €0,59 €0,42
Nemotron-3-Nano 30B-A3B NVFP4 €0,68 €0,45
Ministral-3 8B BF16 €0,67 €0,54
Gemma-4 26B-A4B NVFP4 €0,72 €0,60
Nemotron-3-Nano 30B-A3B FP8 €0,98 €0,61
Qwen-3.6 35B-A3B FP8 €0,94 €0,74
Gemma-4 26B-A4B BF16 + MTP €0,91 €0,76
Gemma-4 26B-A4B BF16 €1,33 €0,99
Nemotron-3-Nano 30B-A3B BF16 €1,81 €1,06
Qwen-3.6 35B-A3B BF16 €1,64 €1,24
Mistral-Small 3.2 24B NVFP4 €1,39 €1,44
Nemotron-3-Super 120B-A12B NVFP4 €1,90 €1,63
Qwen-3.6 27B FP8 €2,08 €1,65
Gemma-4 31B BF16 €5,95 €2,91
  • Assumption: an office of 10 to 25 people keeping the Spark actually under load 8 hours per workday. Bursty use means fewer tokens per month, so a higher price per token. This table is the most favourable reading for on-prem.
  • We compare on output tokens, because that is what we measure (decode). The cloud charges input tokens (the prompts) on top, which don't count here. So the cloud is in reality more expensive than this table shows. The 10-user figures are measured directly; the 25-user peak is derived from the peak run's total throughput times the scenario's output share.
  • Only comparable models make a fair comparison. A small model is cheap per token but does different work. The cleanest point is Mistral Small local versus Mistral Small in the cloud: same model, different place.
  • The runs ran with prefix caching off. On would improve local throughput, and so lower the price.
  • Cloud prices are output tokens. Mistral publishes in euros, OpenAI's GPT-5 mini is converted at $1 = €0,88 (2026-06-26). Source: Mistral and OpenAI. GPT-5 mini as the current generation, the older GPT-4o mini is cheaper. Mistral Small via the API is now Small 4, we benchmark 3.2 locally.
  • Energy: at 170W, electricity is only ~9% of the monthly cost, the rest is hardware amortisation. So on-prem is mostly a hardware story, not an energy one. One million output tokens of Gemma-4 NVFP4 takes roughly 258 Wh, a few cents. Estimated at that flat 170W, since we don't measure power per model.
Esc