Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Most webservers (although I only know Apache) have mechanisms that you can use to prevent specific IPs, user agent strings, etc. from accessing the site.

I think that's the best you can do, as insufficient as that is. This is a hard problem, and that's why I've made my websites private until I can work out a better solution.



It's impossible unless they publish scraper IP ranges...


Well, it's not exactly impossible (you can spot the spiders fairly easily in your access logs), but it is certainly weak sauce.


I want to allow good spiders but not these, how do you differentiate them in access logs?


That's not a use case I've put any thought into, honestly, but if I assume the "good spiders" are well-behaved, you should be able to identify them based on their user agent string. Or perhaps you can nail down what IP ranges the good spiders are coming from and allow them based on that.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: