Funny, yes. But... a webforum is a good place for these back-and-forths.
I'm on the other side of this fence to you. I agree that the conclusion here is that "it is flaky." Disagree about what that means.
As LLMs progress, 10% accuracy becomes 50% accuracy. That becomes 80% accuracy and from there to usable accuracy... whatever that is per case. Not every "better than random" seed grows into high accuracy features, but many do. It's never clear where accuracy ceilings are, and high reliability applications may be distant but... sufficient accuracy is not necessarily very high for many applications.
Meanwhile, the "accurate-for-me" fix is usually to use the appropriate model, prompt or such. Well... these are exactly the kind of optimizations that can be implemented in a UI like "LLM search."
I'm expecting "LLMs eat search." They don't have to "solve truth." They just have to be better and faster than search, with fewer ads.
I'm on the other side of this fence to you. I agree that the conclusion here is that "it is flaky." Disagree about what that means.
As LLMs progress, 10% accuracy becomes 50% accuracy. That becomes 80% accuracy and from there to usable accuracy... whatever that is per case. Not every "better than random" seed grows into high accuracy features, but many do. It's never clear where accuracy ceilings are, and high reliability applications may be distant but... sufficient accuracy is not necessarily very high for many applications.
Meanwhile, the "accurate-for-me" fix is usually to use the appropriate model, prompt or such. Well... these are exactly the kind of optimizations that can be implemented in a UI like "LLM search."
I'm expecting "LLMs eat search." They don't have to "solve truth." They just have to be better and faster than search, with fewer ads.