> LLMs are a decent search engine a la Google circa 2005.
Statistical text (token) generation made from an unknown (to the user) training data set is not the same as a keyword/faceted search of arbitrary content acquired from web crawlers.
> The problem is that the conversational interface, for some reason, seems to turn off the natural skepticism that people have when they use a search engine.
For me, my skepticism of using a statistical text generation algorithm as if it were a search engine is because a statistical text generation algorithm is not a search engine.
Search engines can be really good still if you have a good idea what you're looking for in the domain you're searching.
Search engines can suck when you don't know exactly what you're looking for and the phrases you're using have invited spammers to fill up the first 10 pages.
They also suck if you want to find something that's almost exactly like a very common thing, but different in some key aspect.
For example, I wanted to find some texts on solving a partial differential equation numerically using 6th-order or higher finite differences, as I wanted to know how to handle boundry conditions (interior is simple enough).
Searching only turned up the usual low-order methods that I already knew.
Asking some LLMs I got some decent answer and could proceed.
Back in the day you could force the search engines to restrict their search scope, but they all seem so eager to return results at all cost these days, making them useless in niche topics.
I agree completely. Personally, I actually like the list of links because I like to compare different takes on a topic. It's also fascinating to see how a scientific study propagates through the media or the way the same news story is treated over time, as trends change. I don't want a single mashed-up answer to a question and maybe that makes me weird but more worrying, whenever I've asked a LLM for an answer to a question on a topic I happen to know a LOT about, the response has been either incorrect or inadequate - "there is currently no information collected on that topic" I do like Perplexity for questions like "without any preamble whatsoever, what is the fastest way to remove a <whatever>stain from X material?"
I almost never bother using Google anymore. When I search for something, I'm usually looking for an answer to question. Now I can just ask the question and get the answer without all the other stuff.
I will often ask the LLM to give me web pages to look at it when I want to do further reading.
As LLMs get better, I can't see myself going back to Google as it is or even as it was.
By that logic, it's barely worth reading a newspaper or a book. You don't know if they're giving you accurate information without doing all the research you're trying to avoid.
Recognised newspapers will curate by hiring smart, knowledgeable reporters and funding them to get reliable information. Recognised books will be written by a reliably informed author, and reviewed by other reliably informed people. There are no recognised LLMs, and their method of working precludes reliability.
Malcolm Gladwell, Jonah Lehrer, Daniel Kahneman, Matthew Walker, Stephen Glass? The New York Times, featuring Judith Miller on the existence of WMD, or their award winning podcast "Caliphate"? (Award returned when it became known the whole thing was made up, in case you haven't heard of that one).
Not anymore, not for a long time. There are very few truly reliable and trustworthy sources these days. More and more "recognized" publications are using LLMs. If a "recognized" authority gives you LLM slop, that doesn't make it any more trustworthy.
> Statistical text (token) generation made from an unknown (to the user) training data set is not the same as a keyword/faceted search of arbitrary content acquired from web crawlers.
Well, it's roughly the same under the hood, mathematically.
All of the current models have access to Google and will do a search (or multiple searches), filter and analyze the results, then present a summary of results with links.
Statistical text (token) generation made from an unknown (to the user) training data set is not the same as a keyword/faceted search of arbitrary content acquired from web crawlers.
> The problem is that the conversational interface, for some reason, seems to turn off the natural skepticism that people have when they use a search engine.
For me, my skepticism of using a statistical text generation algorithm as if it were a search engine is because a statistical text generation algorithm is not a search engine.