Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You always want a bit of headroom for context. It's a problem I keep bumping into with 32B models on a 24GB card: the decent quants fit, but the context you have available on the card isn't quite as much as I'd like.



Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: