Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's not quite as trivial as that; one could start the page with a <script> tag that contains "<!--" without matching "-->", and that would hide all the content from your scraper but not from real browsers.

But I think it's moot, parsing HTML is not very expensive if you don't have to actually render it.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: