> Google has two interlocked monopolies, one is the search index The index is th...

ChuckMcM · 2025-05-11T05:29:47 1746941387

I see it a bit differently, many (most?) web sites explicitly deny scraping execept for Google. Further Google has the infrastructure to crawl several trillion web pages and create a relevant index out of the most authoritative 1.5 trillion. To re-create that on your own, you would need both the web to allow it, and the infrastructure to do it. I would agree that this isn't an insurmountable moat but it is a good one.

mike_d · 2025-05-12T04:55:33 1747025733

Most websites only explicitly deny scraping by bad bots (robots.txt). Things like Cloudflare are a completely different matter, and I have a whole batch of opinions about how they are destroying the web.

I'd love to compete directly with OpenAI, but the cost of a half million GPUs is a me problem - not a them problem. Google can't be faulted for figuring out how to crawl the web in an economically viable way.