No, your OP is mistaken. The model weights have to all be accessed for the forwa... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		w1nk on April 1, 2023 \| parent \| context \| favorite \| on: Llama.cpp 30B runs with only 6GB of RAM now No, your OP is mistaken. The model weights have to all be accessed for the forward pass. What has happened is that using mmap changes where the memory is consumed (kernel vs process) and so it was being incorrectly interpreted. There are still 30B parameters, and you'll need that times however big your floating point representation is to use the model still.

xsmasher on April 2, 2023 [–]

But do they all need to be accessed at the same time? If not, pages that are not being actively used can be dropped from memory until needed again.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact