Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Inference is not fast by any metric. It is many, MANY orders of magnitude slower than alternatives.


Honestly Gemini Flash Lite and models on Cerebras are extremely fast. I know what you are saying. If the goal is to get a lot of results where they may or may not be relevant, then yes, it is an order of a magnitude slower.

If you take into consideration the post analysis process, which is what inference is trying to solve, is it an order of a magnitude slower?


More like 6-8 orders of magnitude slower. That’s a very nontrivial difference in performance!


How are you quantify the speed at which results are reviewed?


It’s not speed, but cost to compute.


It has become fast enough that another call isn't going to overwhelm your pipeline. If you needed this kind of functionality for performance computing perhaps it wouldn't be feasible, but it is being used to feed back into an LLM. The user will never notice.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: