You have LinkedIn and Twitter examples, where you're very likely violating their...

marcell · on Sept 4, 2024

Scraping is semi-controversial, but in this case it's just a user with a Chrome extension visiting the site. LinkedIn has lots and lots of shady patterns around showing different results to Google Bot vs. regular users to encourage logged in sessions. Many other sites like Pinterest and Twitter/X employ similar annoying patterns.

Imo, users should be allowed to use automation tools to access websites and collect data. Most of these sites thrive off of user generated content anyways, for example Reddit is built on UGC. Why shouldn't people be able to scrape it?

kaoD · on Sept 4, 2024

In hopes of saving someone a search: UGC = User Generated Content.

firtoz · on Sept 4, 2024

If let's say I built an extension that allows people to scrape things on demand and the extension sends that data also to my servers, removing PII in the process, would that be allowed?

padolsey · on Sept 4, 2024

Technically it's acting on behalf of a proactive user in Chrome so IMHO is non-"robotic". But heh tbf this was also the excuse of Perplexity where they argued they are a legitimate non-robotic-user-agent (thus don't need to respect robots.txt) because they only make requests at the time of a user query. We need a new way of understanding what it even means to be a legitimate human user-agent. The presence of AIs as client-side catalysts will only grow.

silvanocerza · on Sept 4, 2024

Scraping is not illegal. Note that this decision is before the AI craze.

https://www.forbes.com/sites/zacharysmith/2022/04/18/scrapin...

aguaviva · on Sept 4, 2024

The parent didn't say the scraping was "illegal", but that it violated ToS.

These are entirely different things. The upshot of the proceedings is that while the courts ruled there wasn't sufficient for an injunction to stop the scraping, it was nonetheless still injurious to the plaintiff and had breached their User Agreement -- thus allowing LinkedIn to compel hiQ towards a settlement.

From Wikipedia:

   The 9th Circuit ruled that hiQ had the right to do web scraping.[1][2][3] However, the Supreme Court, based on its Van Buren v. United States decision,[4] vacated the decision and remanded the case for further review in June 2021. In a second ruling in April 2022 the Ninth Circuit affirmed its decision.[5][6] In November 2022 the U.S. District Court for the Northern District of California ruled that hiQ had breached LinkedIn's User Agreement and a settlement agreement was reached between the two parties.[7]

robofanatic · on Sept 5, 2024

I see scraping to be equivalent to a Cherry Tree Shaking machine :-) If you are authorized to pick cherries from a tree then why not use a tree shaker and do the job in seconds but yeah make sure you don't kill the tree in the process. Also the tree owner must have right to deny you from using the tree shaker machine.

https://www.youtube.com/watch?v=miBk0lyMBC0

siamese_puff · on Sept 7, 2024

Hackers have gotten so boring these days. A fellow hacker builds a fun tool and we first gravitate toward the legal implications?