Apparently, you can send a video clip of yourself and the model will be trained on it and then you can write any text and generate a video of yourself saying those words.
Does anyone know how it works? What foundational model they might be using? I'd love to replicate this locally.
https://paperswithcode.com/task/talking-head-generation
https://github.com/harlanhong/awesome-talking-head-generatio...