Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It is a bit ironic that a paywalled article like this will have a top level comment with the archive link, which can then be easily scraped by AI (along with the comments)


Also interesting how sites like this are mainstream whereas a link to a site hosting an mp3 of pirated music wouldn’t be tolerated in discussion forums like this.

I think a big difference is that there’s no micro transactions or compulsory licensing for content, so it always feels patently unfair to buy a subscription to read one article.


I'd argue it's more that RIAA has historically been much more aggressive at suing than newspapers or magazines.


True. I think it has ended up a net good. People make a living on music, and licensed music is everywhere.


Kinda hard to discuss the news when your members can't read the news.


In this case, it also seems like the paywall doesn't show up if you have JavaScript disabled, which I find strange, but lots of news sites are like that I think.


It's not ironic at all. The only reason the anti-paywall sites work is that the news companies in fact want some scrapers reading the full article.


Actually, the team behind archive dot today in at least spiegel.de has premium accounts, I presume bought with anonymous credit cards.

You can see artifacts when their servers are at queue load and you see the URLs, a few resources have the JWT with the account details in the URL. IIRC the clearname of the account in the token is Masha Rabinovich, with an email account [email protected], an identity that has cropped up in various investigations [1][2].

[1] https://gyrovague.com/2023/08/05/archive-today-on-the-trail-...

[2] https://webapps.stackexchange.com/questions/145817/who-owns-...


Related: Has anyone trained an LLM strictly on HN comments and linked-to articles ? I for one would get a kick out of interrogating it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: