> The GPU is significantly faster and it has cuda, But (non-batched) LLM process... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		yencabulator 8 days ago \| parent \| context \| favorite \| on: Nvidia sells tiny new computer that puts big AI on... > The GPU is significantly faster and it has cuda, But (non-batched) LLM processing is usually limited by memory bandwidth, isn't it? Any extra speed the GPU has is not used by current-day LLM inference.

Numerlor 6 days ago [–]

I believe just inference is bandwidth limited, prompt processing and other tasks on the other hand needs the compute. As I understand it, the workstation is also as a whole focused on the local development process before readying things for the datacenters, not just running LLMs

Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact