Hacker News new | past | comments | ask | show | jobs | submit login

You can get residential proxies as an alternative, but it's pretty expensive.



It’s not really that expensive, I guess it depends on your use case but it’s only like $4/GB (edit: misleading, see replies): https://oxylabs.io/products/residential-proxy-pool or https://brightdata.com/proxy-types/residential-proxies

Usually you only need some subset of the data per page load if you invest some time looking at dev tools you can probably find the API call you need and save yourself a few MB.

They all offer scraping APIs now too that can be cheaper for certain use cases where you only need a subset of the data that is actually loaded. Like $1.3 per 1k requests: https://oxylabs.io/products/scraper-api/web/pricing or $1 per 1k: https://brightdata.com/pricing/web-scraper


Providers like Oxylabs can be quite restrictive, preventing access to many of the common sites that scrapers choose to target.

https://faq.oxylabs.info/en/articles/8826164-restricted-targ...

Additionally I believe that $4/GB is an introductory price. When I went into the Oxylab dashboard, it showed me $8.


>Providers like Oxylabs can be quite restrictive, preventing access to many of the common sites that scrapers choose to target.

Most of them seem pretty reasonable?

"Entertainment & streaming" - who's trying to scrape netflix's library?

"Banking and other financial institutions" / "Government websites" / "Mailing" - seems far more likely it'll be used for credential stuffing than for "scraping".

"Ticketing" - seems far more likely that it'll get used by scalpers than for scraping

The main targets of scraping - e-commerce sites (for price comparisons) and social media networks (for user generated content) are fine to scrape. Is there some use case I'm missing here? Is there a huge contingent of people wanting to scrape ticketmaster or bank of america?


I used the term "scrapers" pretty loosely, but yes, in many cases they are more bad actors than actual scrapers. However as they say the list may include other sites, I suspect Oxylab adds sites to the list at the site owners' requests (Amazon, Target, etc are likely to be on those lists)


Hmm that’s unfortunate. I’m actually scaling up a data journalism project this month which is why I’ve been looking at these.

I’m curious if you can suggest a happy medium between curl-impersonate on VPS (dirt cheap) vs residential proxy ($8/gb)?

Personally I’m not trying to hit any of those common sites.


US court says Brightdata's web scraping service can be used to scrape Facebook:

https://www.courthousenews.com/wp-content/uploads/2024/01/me...


Damnit I didn't even knew this existed with such insane pricing.

"Residential proxies is based on traffic and purchase model. Pay as you go model starts at $7.35 per GB, and can be discounted as low as $1.84 per GB when purchased in bulk."


Yeah but if you’re just scraping it’s only a kb or 2 per request. Once or twice a day to check the price of an item would let you track thousands of items for years for just 8$


Depends on the site. Some sites are sending down several megabytes of Javascript or images per request. Some sites even send down massive JSON payloads to page through instead of doing it iteratively.


Exactly. Or if you decide just to scrape whole thing with headless browsers. It would be ridicilously expensive.

But I would guess this type of proxies is mostly use to send data rather than receive. You can access geo-locked sites through standard vpn.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: