Cloning voice signature or timbre may need a bit more for a good quality. Then there are idiosyncracies in one’s voice. In addition to that, there are tiny verbal tics, expressions, cadence, feel, and some more to be able to say you have properly cloned someone’s voice. The two second sample is like a shallow clone of sorts and is indeed vector space.
The last sentence is hilarious. What do you think "properly" cloned voices are? Not every model is few shot and not every model relies on their training set for paralanguage anymore. Easiest way to try it out is properly the pro voice cloning from elevenlabs.