Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We developed a novel optimization pipeline for LLMs so large models can run on a standard laptop.

Our first prototype optimized an 80B model to run at full 256k context at 40 tokens/s while only taking up 14gb of RAM.

We are currently leveraging this tech to build https://cortex.build a terminal AI coding assistant.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: