beg to differ, I'm living fine with 1.5tk/sec

danielhanchen · 2025-07-23T03:08:47 1753240127

Spec decoding on a small draft model could help increase it by say 30 to 50%!

segmondy · 2025-07-23T03:52:22 1753242742

i'm not willing to trade any more quality for performance. no draft, no cache for kv either. i'll take the performance cost, it just makes me think carefully about my prompt. i rarely every need more than one prompt to get my answers. :D

jychang · 2025-07-23T10:32:30 1753266750

Speculative decoding doesn't change output tokens.

zackangelo · 2025-07-23T05:56:40 1753250200

Draft model doesn’t degrade quality!

segmondy · 2025-07-23T13:15:41 1753276541

I beg to differ, especially when it comes to code.