Jin Daily AI Trivia: DGX Spark's Last Bragging Right Just Got Taken Away
Jin Daily AI Trivia: DGX Spark’s Last Bragging Right Just Got Taken Away
Remember when I said DGX Spark was overpriced and underperforming? Nvidia fanboys had one comeback: “But the prefill speed though.”
Fair. That one was real.
Here’s how LLM inference actually works. Two phases - prefill (processing your prompt) and decode (generating tokens one by one). Prefill is compute-bound, so raw GPU FLOPS matter. Decode is memory-bandwidth-bound, so fast memory wins.
DGX Spark has more GPUs and Tensor Cores, so it was always 3-4x faster than the GPU-only Metal prefill on Apple Silicon.
Then M5 Pro and M5 Max happened.
Apple quietly added better matrix multiply acceleration in M5, specifically targeting 4-bit quantized models - exactly what most local LLM users actually run. With the new Metal tensor API, the equation changes. Early community benchmarks show M5 Max hitting around 4,500 t/s prefill - roughly 4x what M4 Max could do.
So where does DGX Spark stand now?
What this means if you’re already on Apple Silicon:
Token generation on M5 Max scales with bandwidth, and the jump from M4 Max’s 546 GB/s to M5 Max’s 614 GB/s is only about 12%. So if you’re already on an M4 Max 128GB, the upgrade is mostly about faster prompt processing. Decode gains alone don’t justify it.
If you’re on an M3 Ultra and struggling with long prefills on big contexts, M5-class silicon may be a meaningful jump. Long 16K to 128K prompts that used to take minutes should feel much faster.
The bottom line: M5 Pro and M5 Max are not a bandwidth revolution. The bus class stays the same and decode gains are modest. The real story is prompt processing - and if real benchmarks confirm up to 4x faster prefill on 4-bit models, this fixes the main weakness Apple Silicon had for large-context inference.
For the 90% of local LLM users running Q4 models between 4B and 30B, M5 Max just killed Spark’s last talking point.
And the pricing situation just got even funnier.
Nvidia just raised the DGX Spark price to $4,699. Meanwhile you can get a 14” MacBook Pro M5 Max with 128GB for $4,649 (edu). That includes the Retina display, battery etc - the entire computer.
Did I mention it run same speed without connect to power as well? and 1/3 the power draw?
The only reason left to buy DGX Spark is because ur boss/university say they only accept CUDA machine :P
