Please remember that an LLM accessing any website isn't the problem here. It's t...

TeMPOraL · 2025-09-24T23:05:12 1758755112

Scraping was a thing before LLMs, there's a whole separate arms race around this for regular competition and "industrial espionage" reasons. I'm not really sure why model training would become a noticeable fraction of scrapping activity - there's only few players on the planet that can afford to train decent LLMs in the first place, and they're not going to re-scrape the content they already have ad infinitum.

int_19h · 2025-09-25T04:30:41 1758774641

> they're not going to re-scrape the content they already have

That's true for static content, but much of it is forums and other places like that where the main value is that new content is constantly generated - but needs to be re-scraped.

a96 · 2025-09-26T07:44:47 1758872687

If only sites agreed on putting a machine readable URL somewhere that lists all items by date. Like a site summary or a syndication stream. And maybe like a "map" of a static site. It would be so easy to share their updates with other interested systems.

int_19h · 2025-09-26T19:28:11 1758914891

Why should they agree to make life even easier for people doing something they don't want?

TeMPOraL · 2025-09-27T23:00:08 1759014008

You've summarized the main problem with the Internet for the past 2+ decades.

Everyone thinks they're a master fisher, and gets up in arms about those pesky users not acting dumb fish, instead trying to eat the content bait without biting the monetization hook.