Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
renewiltord
on March 31, 2023
|
parent
|
context
|
favorite
| on:
Llama.cpp 30B runs with only 6GB of RAM now
That's the size on disk, my man. When you quantize it to a smaller float size you lose precision on the weights and so the model is smaller. Then here they `mmap` the file and it only needs 6 GiB of RAM!
gliptic
on April 1, 2023
[–]
The size mentioned is already quantized (and to integers, not floats). mmap obviously doesn't do any quantization.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: