Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Could you give a ballpark figure for what you mean by large scale scraping? I've only worked on a couple projects, one was a broad (100K to 500K domains) and shallow (root + 1 level of page depth, also with a low cap on the number of children pages). The other just a single domain but scraping around 50K pages from it.


I would say millions of domains regularly. That's where the pricing of most 'scraping services' falls down too compared to just doing it yourself.


My experience was with e-commerce scraping. Not many domains, but a massive catalogue.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: