Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

20 GB JSON is surprising to me. I have an sqlite file of all HN data that is 20 GB, it would be much larger as JSON.


20 GB of JSON is correct; here’s the entire dump straight from the API up to last Monday:

  $ du -c ~/feepsearch-prod/datasource/hacker-news/data/dump/*.jsonl | tail -n1
  19428360        total
Not sure how your sqlite file is structured but my intuition is that the sizes being roughly the same sounds plausible: JSON has a lot of overhead from redundant structure and ASCII-formatted values; but sqlite has indexes, btrees, ptrmaps, overflow pages, freelists, and so on.


Sqlite also doesn’t have fixed types, but uses a tagged value system to store data. Well according to what I’ve read on the topic.


SQLite files are optimized for fast querying, not size.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: