That's the size on disk, my man. When you quantize it to a smaller float size yo... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		renewiltord on March 31, 2023 \| parent \| context \| favorite \| on: Llama.cpp 30B runs with only 6GB of RAM now That's the size on disk, my man. When you quantize it to a smaller float size you lose precision on the weights and so the model is smaller. Then here they `mmap` the file and it only needs 6 GiB of RAM!

gliptic on April 1, 2023 [–]

The size mentioned is already quantized (and to integers, not floats). mmap obviously doesn't do any quantization.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact