Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The index is tiny, not even a terabyte. Right now it's a few hundred gigabytes for ~20 million URLs. But it's stored in an extremely dense binary format.

Honestly you may just want to roll your own solution for storing a ton of files. If you don't need a general-purpose filesystem, but an append-only archive with extra metadata, then you can cut a lot of corners. Like if you have a file system that is fixed-size and append-only, you can build it in a way no off-the-shelf stuff can.

This line of thinking is a large part of why my index is so small and fast. I have a lot of special built data-structures that are built for their exact use case. Like a fixed size append-only hash map that uses mapped memory and can in theory be larger than the system memory. Very good for a search engine, absolutely useless almost everywhere else.



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: