Any idea if there is a way to run on 256gb ram + 16gb vram with usable performan... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		jdright 88 days ago \| parent \| context \| favorite \| on: Qwen3-Coder: Agentic coding in the world Any idea if there is a way to run on 256gb ram + 16gb vram with usable performance, even if barely?

danielhanchen 88 days ago [–]

Yes! 3bit maybe 4bit can also fit! llama.cpp has MoE offloading so your GPU holds the active experts and non MoE layers, thus you only need 16GB to 24GB of VRAM! I wrote about how to do in this section: https://docs.unsloth.ai/basics/qwen3-coder#improving-generat...

jdright 87 days ago | [–]

awesome documentation, I'll try this. thank you!

Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact