If anything I think I'm only scraping the surface, since a lot of my methods involve ruthlessly discarding data that doesn't live up to a fairly blunt set of criteria. I think with something like a headless browser (and a lot more processing power), I could probably use laxer standards and find even more good stuff.