PACER exists at the federal level. Otherwise you have to piece it together from each jurisdiction yourself, defeating any anti-scraping measures in the process. Unless someone happens to have made such a dataset available via torrent at some point?
Nah, they're .csv files, not even PDFs. It's just that it's a lot of text. (The valuations of the LLM giants don't seem too crazy when you realize just how much of the US economy is dedicated to creating and shuffling text.)
There's so much text in each of those monstrous .csv files that you can learn quite a lot if you run a statistical analysis on just one of them.
Where can I find this?