Impressive work. I wonder what would be the best way to use 8k embeddings. It’s ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		omneity on Oct 26, 2023 \| parent \| context \| favorite \| on: Jina AI launches open-source 8k text embedding Impressive work. I wonder what would be the best way to use 8k embeddings. It’s a lot of information to keep in a vector, so things like “precision” of the embedding space and its ability to distinguish very similar large documents will be key. Maybe it can be useful for coarse similarity matching, for example to detect plagiarism?

sroussey on Oct 26, 2023 [–]

8K is the context length. Their vector dimension size is actual much smaller, which is great for a number of use cases, though maybe not the ones you are thinking about.

omneity on Oct 26, 2023 | [–]

Yes that’s also how I understood it. Maybe it was ambiguously expressed, but I mean “8k tokens as input is a lot of information to encode”

Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact