Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> The GPU is significantly faster and it has cuda,

But (non-batched) LLM processing is usually limited by memory bandwidth, isn't it? Any extra speed the GPU has is not used by current-day LLM inference.





I believe just inference is bandwidth limited, prompt processing and other tasks on the other hand needs the compute. As I understand it, the workstation is also as a whole focused on the local development process before readying things for the datacenters, not just running LLMs



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: