Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

No, I actually count the n-grams as distinct words (up to 4-grams). The main limiter is for that is space, so I only extract "canned" n-grams from some tags.

I would first search for the bigram hello_world, that's an O(1) array lookup; as then documents merely containing the words hello and world (usually not a good search result), that's the algorithm I'm describing in the parent comment.



Makes sense. Every time you insert a new URL for a word you have to update the ranges for every other word since the URL file will be shifted?




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: