37 billion bytes per token? Edit: Oh assuming this is an estimate based on the m...

GaggiX · 2025-08-28T15:14:51 1756394091

> (And actually during generation you can use speculative decoding to do better than this roofline anyways).

And more importantly batches, so taking the example from the blog post, it would be 32 tokens per each forward pass in the decoding phase.

mutkach · 2025-08-28T15:24:49 1756394689

There's also an estimation of how much a KV cache grows with each subsequent token. That would be roughly ~MBs/token. I think that would be the bottleneck