The GPU memory allocation refers to how capacity is alloted, not bandwidth. Soun...

rz2k · 2025-08-28T02:39:29 1756348769

384GB is 75% of 512GB. The M3 Ultra bandwidth is over 800GB/s, though potentially less in practice.

Using an M3 Ultra I think the performance is pretty remarkable for inference and concerns about prompt processing being slow in particular are greatly exaggerated.

Maybe the advantage of the DGX Spark will be for training or fine tuning.

vid · 2025-08-28T10:51:05 1756378265

I very consistently see people say prompt processing is slow for larger context sizes ("notoriously slow"), something that is much less of an issue with eg CUDA setups.

Art9681 · 2025-08-29T01:08:27 1756429707

Depends on the model. gpt-oss-120b will easily crunch large prompts in a few seconds. It's remarkable. It's gpt-4-mini at home.