Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

For these sites, I crawl using a JS powered engine, and just save the relevant page content to disk.

Then I can craft my regex/selectors/etc., once I have the data stored locally.

This helps if you get caught and shut down - it won't turn off your development effort, and you can create a separate task to proxy requests.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: