Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There's a project [0] that parses Commoncrawl data for various schemas, it contains some interesting datasets.

[0] http://webdatacommons.org/



That’s a really useful link, thanks for sharing. We’re building a scrapping service and only parsing rely on native html tags and open graph metadata, based on this link we should definitely take a step forward to parse JSON-LD as well.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: