1) At this moment about 70 million documents. I've had it at about 110 million, ...

abracadaniel · on April 18, 2023

Is there a domain list if I wanted to crawl the hosts myself? I see you have the raw crawl data, which is appreciated, but a raw domain list would be cool.

marginalia_nu · on April 18, 2023

I guess technically that could be arranged. Although I don't want everyone to run their own crawler. It would annoy a lot of webmasters and end up with even more hurdles to be able to run a crawler. Better to share the data if possible.