Most webservers (although I only know Apache) have mechanisms that you can use to prevent specific IPs, user agent strings, etc. from accessing the site.
I think that's the best you can do, as insufficient as that is. This is a hard problem, and that's why I've made my websites private until I can work out a better solution.
That's not a use case I've put any thought into, honestly, but if I assume the "good spiders" are well-behaved, you should be able to identify them based on their user agent string. Or perhaps you can nail down what IP ranges the good spiders are coming from and allow them based on that.
I think that's the best you can do, as insufficient as that is. This is a hard problem, and that's why I've made my websites private until I can work out a better solution.