Most webservers (although I only know Apache) have mechanisms that you can use t...

throwaway290 · on April 19, 2023

It's impossible unless they publish scraper IP ranges...

JohnFen · on April 19, 2023

Well, it's not exactly impossible (you can spot the spiders fairly easily in your access logs), but it is certainly weak sauce.

throwaway290 · on April 20, 2023

I want to allow good spiders but not these, how do you differentiate them in access logs?

JohnFen · on April 20, 2023

That's not a use case I've put any thought into, honestly, but if I assume the "good spiders" are well-behaved, you should be able to identify them based on their user agent string. Or perhaps you can nail down what IP ranges the good spiders are coming from and allow them based on that.