We need a project in the spirit of Spamhaus to actively maintain a list of perpe...

mrweasel · 2025-03-20T13:44:09 1742478249

Just block all of AWS, Alibaba, GCP and Azure, or throttle them aggressively. If you have clients/customers that need more requests per second then have them provide you with their IPs.

The problem is that these companies are fairly well funded and renting infrastructure isn't an issue.

kijin · 2025-03-20T14:03:32 1742479412

Exactly. They're renting infrastructure on well-known clouds, not cycling through consumer IPs like yesterday's botnets. Block all web traffic from well-known cloud IPs, and you can keep 99% of the LLM bots away. Alibaba seems to be the most common source of bot traffic on my infrastructure lately, and I also see Huawei Cloud from time to time. Not much AWS, probably because of their high IPv4 pricing.

You can allow API access from cloud IPs, as long as you don't do anything expensive before you've authenticated the client.

cgh · 2025-03-20T14:25:07 1742480707

From the article:

“…they do so using random User-Agents that overlap with end-users and come from tens of thousands of IP addresses - mostly residential, in unrelated subnets, each one making no more than one HTTP request over any time period we tried to measure - actively and maliciously adapting and blending in with end-user traffic and avoiding attempts to characterize their behavior or block their traffic.”

So it looks like much of the traffic, particularly from China, is indeed using consumer ips to disguise itself. That’s why they blocked based on browser type (MS Edge, in this case).

dougb5 · 2025-03-20T14:55:47 1742482547

This matches exactly with what I'm seeing on my own sites too and it's from all over the world, not just China.

(I described my bot woes a few weeks ago at https://news.ycombinator.com/item?id=43208623. The "just block bots!" replies were well-intentioned but naive -- I've still found no signal that works reliably well to distinguish bots from real traffic.)

kijin · 2025-03-20T16:08:07 1742486887

I saw a fair amount of that kind of behavior, too, mostly around the summer of last year. At some point it dropped off sharply. Over the last few months, at least for the servers I keep an eye on, most of the trouble has been from Chinese cloud IPs.

Either the LLM devs got more funding, or maybe the authorities took down the botnet they were using.

blueflow · 2025-03-20T13:52:16 1742478736

Why only in the "spirit of Spamhaus"? Spamhaus still exists. Add Google and Microsoft AS to the DROP/NOROUTE list, that would be hilarious.

danaris · 2025-03-20T17:40:39 1742492439

Because while this is clearly related to spam, it's not the same thing, and presumably if Spamhaus themselves felt it was within their wheelhouse, they'd already be doing it.

voidUpdate · 2025-03-20T13:40:21 1742478021

This sounds backwards to me, if you maintain a list of IPs but they are constantly cycling them, it'll get out of date quickly, but a captcha-like system will (hopefully) always stop bot traffic

pavon · 2025-03-20T16:44:27 1742489067

While some of the residential IPs are from malware, a lot of it is from residential IP proxies, where people are paid to run proxy software from their home. If it starts getting around that people who run this software quickly become blocked by the majority of the internet that will lessen that part of the problem.

nzeid · 2025-03-20T13:46:52 1742478412

Only if your CAPTCHA-like is hurled at every client indiscriminately. Otherwise you'll end up right back where Spamhaus started: maintaining your own list of good and bad actors.

The advantage of a third party service is that you're sharing intel of bad actors.

voidUpdate · 2025-03-20T14:12:11 1742479931

I can't confirm but I believe it is applied to every client