Inference is not fast by any metric. It is many, MANY orders of magnitude slower...

sdesol · 2025-09-24T17:11:50 1758733910

Honestly Gemini Flash Lite and models on Cerebras are extremely fast. I know what you are saying. If the goal is to get a lot of results where they may or may not be relevant, then yes, it is an order of a magnitude slower.

If you take into consideration the post analysis process, which is what inference is trying to solve, is it an order of a magnitude slower?

adastra22 · 2025-09-25T00:09:10 1758758950

More like 6-8 orders of magnitude slower. That’s a very nontrivial difference in performance!

sdesol · 2025-09-25T01:14:08 1758762848

How are you quantify the speed at which results are reviewed?

adastra22 · 2025-09-25T01:21:57 1758763317

It’s not speed, but cost to compute.

9rx · 2025-09-24T17:12:39 1758733959

It has become fast enough that another call isn't going to overwhelm your pipeline. If you needed this kind of functionality for performance computing perhaps it wouldn't be feasible, but it is being used to feed back into an LLM. The user will never notice.