Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I imagine that this in quantized form would fit pretty well and be decent. (Qwen R1 32b[1] or Qwen 3 32b[2])

Specifically the `Q6_K` quant looks solid at ~27gb. That leaves enough headroom on your 64gb Macbook that you can actually load a decent amount of context. (It takes extra VRAM for every token of context you need)

Rough math, based on this[0] calculator is that it's around ~10gb per 32k tokens of context. And that doesn't seem to change based on using a different quant size -- you just have to have enough headroom.

So with 64gb:

- ~25gb for Q6 quant

- 10-20gb for context of 32-64k

That leaves you around 20gb for application memory and _probably_ enough context to actually be useful for larger coding tasks! (It just might be slow, but you can use a smaller quant to get more speed.)

I hope that helps!

0: https://huggingface.co/spaces/NyxKrage/LLM-Model-VRAM-Calcul...

1: https://huggingface.co/bartowski/DeepSeek-R1-Distill-Qwen-32...

2: https://huggingface.co/Qwen/Qwen3-32B-GGUF



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: