So far I have ~ 3M distinct IP addresses per 30 days, with a lot of fresh proxy IPs, 1.7M. The DB contains only verified IP addresses through which I've been able to route traffic. It DOESN'T rely on 3rd party/open-source data sources.
The DB contains different types of proxies:
- Residential
- ISP
- Data center
I don't include mobile proxies since they're heavily shared, so knowing that an IP address was used as a proxy at some point is basically useless.
Regarding your remark, indeed, there are several shared residential IPs, including IPs of legitimate users who may have a shady app that routes traffic through their device. That's why I don't recommend blocking using IP addresses as is. It's supposed to be more of a datapoint/signal to enrich your anti-fraud/anti-bot system.
However, regarding the block list, I analyze the IPs on bigger time frames, the percentage of IPs in the range that were used as proxies, and generate a confidence score to indicate whether or not it is safe to block.
I’m working on a scraping project at the moment so looking at this too but from the other end. Super low volume though so pretty tame - emphasis on success rate more than throughput
I bought a 4G dongle for use as last resort if nothing else gets through. And also investigating ipv6
Using a 4G dongle makes it easier to hide in the crowd indeed. Since your traffic will go through heavily shared mobile IPs, probably with thousands of users behind them, anti-bot vendors won't/shouldn't block per IP, but per fingerprint/session cookie instead.
So far I have ~ 3M distinct IP addresses per 30 days, with a lot of fresh proxy IPs, 1.7M. The DB contains only verified IP addresses through which I've been able to route traffic. It DOESN'T rely on 3rd party/open-source data sources.
I also made an open-source proxy IP block list based on the data: https://github.com/antoinevastel/avastel-bot-ips-lists