I wonder what would be the best way to use 8k embeddings. It’s a lot of information to keep in a vector, so things like “precision” of the embedding space and its ability to distinguish very similar large documents will be key.
Maybe it can be useful for coarse similarity matching, for example to detect plagiarism?
8K is the context length. Their vector dimension size is actual much smaller, which is great for a number of use cases, though maybe not the ones you are thinking about.
I wonder what would be the best way to use 8k embeddings. It’s a lot of information to keep in a vector, so things like “precision” of the embedding space and its ability to distinguish very similar large documents will be key.
Maybe it can be useful for coarse similarity matching, for example to detect plagiarism?