Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

LLMs are a bad deal when you look at how much power you need to run that inference. A device that could barely run one instance of QwQ-32B at glacial speeds will be able to serve multiple concurrent users of Kiwix.


Quick question: which car companies are working on self driving cars? All of them, and two other companies ( Apple and Google ).

Which militaries are working on battle field AI. All of them.

Could a 64Gb dual xeon run say 50 to 100 users of kiwix?


To serve multiple users, probably not.

But--if you don't think of asking Hacker News every single thing you need to know beforehand, I think you still want the LLM to answer questions and help you bootstrap it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: