Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Impressive work.

I wonder what would be the best way to use 8k embeddings. It’s a lot of information to keep in a vector, so things like “precision” of the embedding space and its ability to distinguish very similar large documents will be key.

Maybe it can be useful for coarse similarity matching, for example to detect plagiarism?



8K is the context length. Their vector dimension size is actual much smaller, which is great for a number of use cases, though maybe not the ones you are thinking about.


Yes that’s also how I understood it. Maybe it was ambiguously expressed, but I mean “8k tokens as input is a lot of information to encode”




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: