As a blackbox, it is a generalization of word2vec to sequence2vec. For example, simply summing or averaging word vectors in a sentence can give you a fast & cheap sentence embedding.
But natural language sentences have more structure than natural language words. Ex: it matters precisely where "not" goes in a sentence. So a lot of impressive scientific experimentation went into making these models smarter, with many evolutions. Impressively, this so blackboxed now that doesn't super matter.
Implicit to my post here... that's powerful, and easy to use... but not necessarily a great knowledge representation for someone who wants good Q&A over enterprise-scale data. One of our customer scenarios: "What is known vs believed about incident X." We can index each paragraph as multiple sentence embeddings, so if any phrase matches a query, the full paragraphs can get thrown into GPT as part of our answer. Easy. However, if information in the paragraph may lead to wanting to get information from elsewhere in the system (mention of another team, project, incident, ...), that means either a Planning agent needs to then realize that and recursively generate more vector search queries (mini-AutoGPT)... or we need to index on more than the sentence embedding.
Again, super interesting problems, and we're hiring for folks interested in helping work on it!
So you have token embeddings, but tokens are too small to be useful.
Is "what a sentence means" encoded as a vector once you have passed the embeddings through a transformer or two?