Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Now all the big 3 LLM providers provide web search grounding in their APIs, but how do they compare in ranking quality of the retrieved web search results? Anyone run benchmarks here?

Clearly web search ranking is hard after decades of content spam that's been SEO optimized (and we get to look forward to increasing AI spam dominating the web in the future). The best LLM provider in the future could be the one with just the best web search ranking, just like what allowed Google to initially win in search.



Right now, most LLMs with web search grounding are still in Stage 1: they can retrieve content, but their ability to assess quality, trustworthiness, and semantic ranking is still very limited.

The LLMs can access the web, but they can't yet understand it in a structured, evaluative way.

What’s missing is a layer of engineered relevance modeling, capable of filtering not just based on keywords or citations, but on deeper truth alignment and human utility.

And yes, as you mentioned, we may even see the rise of LLM-targeted SEO—content optimized not for human readers, but to game LLM attention and summarization heuristics. That's a whole new arms race.

The next leap won’t be about just accessing more data, but about curating and interpreting it meaningfully.


>Right now, most LLMs with web search grounding are still in Stage 1: they can retrieve content, but their ability to assess quality, trustworthiness, and semantic ranking is still very limited.

Why do you think it is limited? Imagine you show a link with details to an LLM and ask it if it is trustworthy or high quality w.r.t the query, why can't it answer it?


Don't think the limit is in what LLMs can evaluate - given the right context, they’re good at assessing quality. The problem is what actually gets retrieved and surfaced in the first place. If the upstream search doesn’t rank high-quality or relevant material well, LLM never sees it. It's not a judgment problem, more of a selection problem.


What I mean is that more powerful engineering capabilities are needed to provide LLM with processing of search results.


Not sure I understand -- LLM's are pretty good at assessing quality of search results. If an LLM can bulk assess a bunch of results it can get a pretty far, probably more efficient than a human hand checking all the results.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: