Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I tried the local small models. They are slow, much less capable, and ironically much more expensive to run than the frontier cloud models.


Phi4-mini runs on a basic laptop CPU at 20T/s… how is that slow? Without optimization…


I was running Qwen3-32B locally even faster, 70T/s, still way too slow for me. I'm generating thousands of tokens of output per request (not coding), running locally I could get 6 mil tokens per day and pay electricity, or I can get more tokens per day from Google Gemini 2.5 Flash for free.

Running models locally is a privilege for the rich and those with too much disposable time.


Try Qwen3-30B-A3B. It's MoE to an extent where its use of memory bandwidth looks more like a 3B model, and thus it typically goes faster.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: