Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
OpenAI scraping Reddit through redlib instances (hcrypt.net)
4 points by udev4096 27 days ago | hide | past | favorite | 3 comments



The author used to run a public Arch mirror under mirror.ext4.xyz, so it's not exactly an unknown domain.

Combined with the fact that a lot of their self-hosted stuff, including the Reddit front-ends, are in the Certificate Transparency logs [1], it's not hugely surprising that web crawlers would run into them.

[1]: https://crt.sh/?q=ext4.xyz


I am definitely not surprised. It's quite normal, as I stated. What I was trying to say was they are abusing the private frontends to get around legal restrictions of aggressively scraping the web


that's not scraping, that's web search.

their scrapers wouldn't identify themselves




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: