Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Nah, they're .csv files, not even PDFs. It's just that it's a lot of text. (The valuations of the LLM giants don't seem too crazy when you realize just how much of the US economy is dedicated to creating and shuffling text.)

There's so much text in each of those monstrous .csv files that you can learn quite a lot if you run a statistical analysis on just one of them.



Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: