Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Since I posted an article here about using zip bombs [0], I'm flooded with bots. I'm constantly monitoring and tweaking my abuse detector, but this particular bot mentioned in the article seemed to be pointing to an RSS reader. I white listed it at first. But now that I gave it a second look, it's one of the most rampant bot on my blog.

[0]: https://news.ycombinator.com/item?id=43826798



If I had a shady web crawling bot and I implemented a feature for it to avoid zip bombs, I would probably also test it by aggressively crawling a site that is known to protect itself with hand-made zip bombs.


also protect yourself fromnsucking up fake generated content. i know some folks here like to feed them all sorts of 'data' . fun stuff :D


One of the few manual deny-list entries that I have made was not for a Chinese company, but for the ASes of the U.S.A. subsidiary of a Chinese company. It just kept coming back again and again, quite rapidly, for a particular page that was 404. Not for any other related pages, mind. Not for the favicon, robots.txt, or even the enclosing pseudo-directory. Just that 1 page. Over and over.

The directory structure had changed, and the page is now 1 level lower in the tree, correctly hyperlinked long since, in various sitemaps long since, and long since discovered by genuine HTTP clients.

The URL? It now only exists in 1 place on the WWW according to Google. It was posted to Hacker News back in 2017.

(My educated guess is that I am suffering from the page-preloading fallout from repeated robotic scraping of old Hacker News stuff by said U.S.A. subsidiary.)


Rule number one: You do not talk about fight club.


Dark forest theory taking root.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: