The models (weights and activations and caches) can fill all the memory you have... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		modeless 23 days ago \| parent \| context \| favorite \| on: Apple M5 chip The models (weights and activations and caches) can fill all the memory you have and more, and to a first (very rough) approximation every byte needs to be accessed for each token generated. You can see how that would add up. I highly recommend Andrej Karpathy's videos if you want to learn details.

pfortuny 23 days ago | [–]

A very simplified version is: you need all the matrix to compute a matrix x vector operation, even if the vector is mostly zeroes. Edit: obviously my simplification is wrong but if you add up compression, etc… you get an idea.

rs186 23 days ago | [–]

Would you mind specifying which video(s)? He has quite a lot of content to consume.

Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact