Not the whole web; LinkedIn and a few others block us and we fully respect robots.txt, but we have ~8 billion pages.
edit: from article, "Doing this for a few urls is easy but doing it for billions of urls starts to get tricky and expensive (although not completely out of reach)" - indeed so, but we have now done embeddings for about half of those ~8 billion pages and are using them for mojeek.com.
We have an API with many features including uniquely authority and ranking scorings. Embeddings could be added.
Author of the article here. Just went though your website and I can not believe I never heard about Mojeek. I'll probably have a go at your API eventually.
edit: from article, "Doing this for a few urls is easy but doing it for billions of urls starts to get tricky and expensive (although not completely out of reach)" - indeed so, but we have now done embeddings for about half of those ~8 billion pages and are using them for mojeek.com.
We have an API with many features including uniquely authority and ranking scorings. Embeddings could be added.
https://www.mojeek.com/services/search/web-search-api/ used by Kagi, Meta and others. Self-disclosure; Mojeek team member.