Hacker News new | past | comments | ask | show | jobs | submit login

> going so far as to download the fulltext of every legal ruling ever made in the US -- something like 400GB

Where can I find this?






PACER exists at the federal level. Otherwise you have to piece it together from each jurisdiction yourself, defeating any anti-scraping measures in the process. Unless someone happens to have made such a dataset available via torrent at some point?

Start here: https://com-courtlistener-storage.s3-us-west-2.amazonaws.com...

The "opinions" are what you want.

These are huge files heavily compressed, so they're quite difficult to handle.


Why are they huge? Is it just PDF overhead? The opinions themselves should just be some finite number of pages of text no?

Nah, they're .csv files, not even PDFs. It's just that it's a lot of text. (The valuations of the LLM giants don't seem too crazy when you realize just how much of the US economy is dedicated to creating and shuffling text.)

There's so much text in each of those monstrous .csv files that you can learn quite a lot if you run a statistical analysis on just one of them.


Maybe he refers to something like https://law.justia.com/



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: