The datasheet isn't telling you the quantization (intentionally). Model weights ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		reissbaker 42 days ago \| parent \| context \| favorite \| on: Nvidia DGX Spark The datasheet isn't telling you the quantization (intentionally). Model weights at FP16 are roughly 2GB per billion params. A 200B model at FP16 would take 400GB just to load the weights; a single DGX Spark has 128GB. Even two networked together couldn't do it at FP16. You can do it, if you quantize to FP4 — and Nvidia's special variant of FP4, NVFP4, isn't too bad (and it's optimized on Blackwell). Some models are even trained at FP4 these days, like the gpt-oss models. But gigabytes are gigabytes, and you can't squeeze 400GB of FP16 weights into only 128GB (or 256GB) of space. The datasheet is telling you the truth: you can fit a 200B model. But it's not saying you can do that at FP16 — because you can't. You can only do it at FP4.

canucker2016 42 days ago [–]

I never claimed the 200B model was FP16.

If the 200B model was at FP16, marketing could've turned around and claimed the DGX Spark could handle a 400B model (with an 8-bit quant) or a 800B model at some 4-bit quant.

Why would marketing leave such low-hanging fruit on the tree?

They wouldn't.

Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact