Hacker News new | past | comments | ask | show | jobs | submit login

Any recommendations for a js lib that does all the "easy" scraping (microdata, og tags, jsonld, etc)?



The article recommended one.

> There are libraries like https://github.com/digitalbazaar/jsonld.js/ to parse JSON-LD + Microdata for you.


I thought that's what this blog post was going to be about but it's just an ad for their app. I just need the scraping functionality.


While they do end with pushing their product, I think they did a good job of outlining how they scrape the recipes. They inform the reader about json+ld, microdata, and how to scrape the sites that don't use those. They even link to a JS lib that handles the parsing for you. I think calling it "just an ad" is inaccurate.

> There are libraries like https://github.com/digitalbazaar/jsonld.js/ to parse JSON-LD + Microdata for you.


In Python I use the "extruct" package from the scrapy people. It's not very good with syntax errors in the markup.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: