Not OP, but if I was to do this, I'd start by downloading Wikipedia and all its ...

c0wb0yc0d3r · on Sept 16, 2021

I feel a little embarrassed that I didn't think of something like that.

When I did some crawler experimenting in my younger years, I thought I was pretty clever using sites that would let you perform a random Google searches. I would just crawl all the pages from the results returned.

Your method would undoubtedly be more interesting I think. It would certainly lead to interesting performance problems quicker, I bet.