Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Good point -- hopefully the quality impact is still worth it, remains to be seen. Agree on the size -- hopefully something they will keep in mind for future models.


If its better than the equivalent 30B model, that's still a huge achievement.

Llama.cpp's Q2_K quant is 2.5625 bpw with perplexity just barely better than the next step down: https://github.com/ggerganov/llama.cpp/pull/1684

But subjectively, the Q2 quant "feels" worse than its high wikitext perplexity would suggest.

That's apples to oranges, as this quantization is different than Q2_K, but I just hope the quality hit in practice isn't so bad.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: