It's a database storing machine learning embeddings.
Example:
Let's suppose I've downloaded all the Gutemberg library books.
I can feed a transformer like Bert or GPT-3 to calculate the embeddings of these books.
These embeddings will represent in vector form (an array of numbers of fixed size) the meaning of these books.
I can save these vectors in this database and this database can then calculate the distance between these vectors, so basically how closely related they are in terms of topic.
If I query this database with the embedding of a sentence like "Love story between teenagers from 2 enemy families in Italy, they die at the end", hopefully the best result will be Romeo & Juliette.
I'm no ML person so take my comment with a grain of salt in terms of how well it works. But in theory, that's the goal.
Example:
Let's suppose I've downloaded all the Gutemberg library books.
I can feed a transformer like Bert or GPT-3 to calculate the embeddings of these books.
These embeddings will represent in vector form (an array of numbers of fixed size) the meaning of these books.
I can save these vectors in this database and this database can then calculate the distance between these vectors, so basically how closely related they are in terms of topic.
If I query this database with the embedding of a sentence like "Love story between teenagers from 2 enemy families in Italy, they die at the end", hopefully the best result will be Romeo & Juliette.
I'm no ML person so take my comment with a grain of salt in terms of how well it works. But in theory, that's the goal.