> they added another trillion tokens and shrank the model from 18 GB to 9 GB through quantization, reducing its bit width from Mamba2’s 16-bit floating-point precision to 8-bits.
This sounds like what they call "Bamba-9B" is actually an 18B model quantised to 8 bits.
I thought generally we were naming models "nB" by their number of params and treating quantisation as a separate concern. Are there any other models that instead treat the name as an indicative memory requirement?
Is this an attempt to hide that it fares poorly vs other ~18B parameter models?
This sounds like what they call "Bamba-9B" is actually an 18B model quantised to 8 bits.
I thought generally we were naming models "nB" by their number of params and treating quantisation as a separate concern. Are there any other models that instead treat the name as an indicative memory requirement?
Is this an attempt to hide that it fares poorly vs other ~18B parameter models?
EDIT: no, I just misunderstood