OpenAI scraping Reddit through redlib instances

gkbrk · 2025-06-08T15:48:48 1749397728

The author used to run a public Arch mirror under mirror.ext4.xyz, so it's not exactly an unknown domain.

Combined with the fact that a lot of their self-hosted stuff, including the Reddit front-ends, are in the Certificate Transparency logs [1], it's not hugely surprising that web crawlers would run into them.

[1]: https://crt.sh/?q=ext4.xyz

udev4096 · 2025-06-08T17:36:34 1749404194

I am definitely not surprised. It's quite normal, as I stated. What I was trying to say was they are abusing the private frontends to get around legal restrictions of aggressively scraping the web

unstablediffusi · 2025-06-08T15:29:35 1749396575

that's not scraping, that's web search.

their scrapers wouldn't identify themselves