Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The GPU memory allocation refers to how capacity is alloted, not bandwidth. Sounds like the same 256-bit/quad-channel 8000MHz lpddr5 you can get today with Strix Halo.


384GB is 75% of 512GB. The M3 Ultra bandwidth is over 800GB/s, though potentially less in practice.

Using an M3 Ultra I think the performance is pretty remarkable for inference and concerns about prompt processing being slow in particular are greatly exaggerated.

Maybe the advantage of the DGX Spark will be for training or fine tuning.


I very consistently see people say prompt processing is slow for larger context sizes ("notoriously slow"), something that is much less of an issue with eg CUDA setups.


Depends on the model. gpt-oss-120b will easily crunch large prompts in a few seconds. It's remarkable. It's gpt-4-mini at home.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: