I am working on a curated database of proxy IP addresses frequently used by bots...

Havoc · 2025-04-28T12:15:32 1745842532

Wouldn’t this end up flagging a lot of residential IPs due to residential proxies?

avastel · 2025-04-28T12:25:29 1745843129

The DB contains different types of proxies: - Residential - ISP - Data center

I don't include mobile proxies since they're heavily shared, so knowing that an IP address was used as a proxy at some point is basically useless.

Regarding your remark, indeed, there are several shared residential IPs, including IPs of legitimate users who may have a shady app that routes traffic through their device. That's why I don't recommend blocking using IP addresses as is. It's supposed to be more of a datapoint/signal to enrich your anti-fraud/anti-bot system. However, regarding the block list, I analyze the IPs on bigger time frames, the percentage of IPs in the range that were used as proxies, and generate a confidence score to indicate whether or not it is safe to block.

Havoc · 2025-04-28T13:20:02 1745846402

Sounds like pretty sophisticated filtering!

I’m working on a scraping project at the moment so looking at this too but from the other end. Super low volume though so pretty tame - emphasis on success rate more than throughput

I bought a 4G dongle for use as last resort if nothing else gets through. And also investigating ipv6

avastel · 2025-04-28T14:27:00 1745850420

Using a 4G dongle makes it easier to hide in the crowd indeed. Since your traffic will go through heavily shared mobile IPs, probably with thousands of users behind them, anti-bot vendors won't/shouldn't block per IP, but per fingerprint/session cookie instead.

Havoc · 2025-04-28T15:49:25 1745855365

Ah hadn’t realised it’s the NAT. I thought it’s because the IPs are dynamic and rotate too much. Interesting.

Currently planning on doing a layered approach. Cloud IPs first etc.

Interesting challenge but also trying to be somewhat respectful about it since nobody likes aggressive bots