Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

beg to differ, I'm living fine with 1.5tk/sec


Spec decoding on a small draft model could help increase it by say 30 to 50%!


i'm not willing to trade any more quality for performance. no draft, no cache for kv either. i'll take the performance cost, it just makes me think carefully about my prompt. i rarely every need more than one prompt to get my answers. :D


Speculative decoding doesn't change output tokens.


Draft model doesn’t degrade quality!


I beg to differ, especially when it comes to code.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: