No, your OP is mistaken. The model weights have to all be accessed for the forward pass. What has happened is that using mmap changes where the memory is consumed (kernel vs process) and so it was being incorrectly interpreted. There are still 30B parameters, and you'll need that times however big your floating point representation is to use the model still.