How do you decide which layers are the important ones?

danielhanchen · 2025-07-23T06:34:08 1753252448

I wrote approximately in the blog about it and linked some papers! I also wrote about it here - https://unsloth.ai/blog/dynamic-4bit - one has to inspect the activation and weight quantization errors!

blensor · 2025-07-23T15:25:27 1753284327

So you are basically looking at "fMRI" of the "brain" while it's doing a wide range of things and cutting out the things that stay dark the most?

danielhanchen · 2025-07-23T23:26:58 1753313218

Oh that's a good analogy! Yes that sounds right!

menaerus · 2025-07-23T08:50:05 1753260605

> The key reason to use Unsloth quants is because of our deep involvement in fixing critical bugs across major models

sounds convincing, eh ... /s

On the less cynical note, approach does look interesting but I'd also like to understand how and why does it work, if it works at all.

danielhanchen · 2025-07-23T09:26:39 1753262799

Oh we actually fixed bugs! We fixed a few bugs in Gemma - see https://news.ycombinator.com/item?id=39671146, a gradient accumulation bug see https://news.ycombinator.com/item?id=41859037, Phi bugs, Llama bugs and more! See https://unsloth.ai/blog/reintroducing for more details!

menaerus · 2025-07-23T09:32:44 1753263164

What does your approach with dynamics weights has to do with those bugs? All those bugs seem uncorrelated to the technique.

danielhanchen · 2025-07-23T10:22:40 1753266160

Oh apologies I got confused - it's because when we calculate our dynamic quants, we have to do it on the fixed model!

For example in Phi 3 for example, the end of sentence token was wrong - if we use this, then our quants would be calibrated incorrectly, since chatting with the model will use the actual correct token.

Another is Llama 4 - https://github.com/ggml-org/llama.cpp/pull/12889 in which I fixed a RoPE issue - if we didn't fix it first, then again the calibration process would be incorrect.

menaerus · 2025-07-23T11:29:44 1753270184

Ok, this then goes to say that your approach doesn't work without applying whatever fixes to the vanilla models. What I'm trying to understand is the approach itself. Why does it and how does it work?

danielhanchen · 2025-07-23T13:27:54 1753277274

Oh I wrote a bit about it in https://docs.unsloth.ai/basics/unsloth-dynamic-2.0-ggufs and https://unsloth.ai/blog/deepseekr1-dynamic if that helps!