Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Not the whole web; LinkedIn and a few others block us and we fully respect robots.txt, but we have ~8 billion pages.

edit: from article, "Doing this for a few urls is easy but doing it for billions of urls starts to get tricky and expensive (although not completely out of reach)" - indeed so, but we have now done embeddings for about half of those ~8 billion pages and are using them for mojeek.com.

We have an API with many features including uniquely authority and ranking scorings. Embeddings could be added.

https://www.mojeek.com/services/search/web-search-api/ used by Kagi, Meta and others. Self-disclosure; Mojeek team member.



Author of the article here. Just went though your website and I can not believe I never heard about Mojeek. I'll probably have a go at your API eventually.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: