Yup, deep paging is a huge problem for distributed search systems. It's not just a Google thing, its every search engine. Here is a section from ElasticSearch's documentation[0]:
"Avoid using from and size to page too deeply or request too many results at once. Search requests usually span multiple shards. Each shard must load its requested hits and the hits for any previous pages into memory. For deep pages or large sets of results, these operations can significantly increase memory and CPU usage, resulting in degraded performance or node failures."
It's not just a Google thing, its every search engine.
OK, I see now. I tried it on Bing and got similar results with two small caveats. First, Bing gave me 861 accessible results, which is a base 2 order of magnitude greater than Google's. Second, Bing's total number isn't nearly as astronomical, it claims only 191K total results, not Google's 22M.
Could it be that Google has just indexed 100x more terms compared with Bing? Maybe, but my anecdotal use of both of them doesn't really seem to indicate that Bing is so deficient. For example, I tried using a phrase that would come up with just a few results. "bioavailable turmeric extract formulation" (in quotes) yielded 24 results on Google, (plus 4 ad results on top). On Bing I got 33 results, plus 2 ads on top. In fact, Bing looks more like "old Google" than new Google looks like old Google.
Number of results (“match set”) can differ even with the same document corpus. e.g. Tokenization, n-grams, language analysis, stop words, synonyms, etc.
The ElasticSearch documentation is actually pretty good documentation of all search engines in general.
"Avoid using from and size to page too deeply or request too many results at once. Search requests usually span multiple shards. Each shard must load its requested hits and the hits for any previous pages into memory. For deep pages or large sets of results, these operations can significantly increase memory and CPU usage, resulting in degraded performance or node failures."
[0] https://www.elastic.co/guide/en/elasticsearch/reference/curr...