This is exactly what I'm working on! My project is taking Zoom conversation, using pyannote for speaker diarisation, whisper for transcription, pinecone.io for semantic search, then feeding that into GPT-3 so we can ask questions about conversation.
For us this is super useful because it's not unusual for our discover sessions to last days and we're all terrible at taking notes.
As a nerd, my brain is already buzzing on ways that I could use this for my groups D&D campaigns.
Are you getting good results when summarizing a human speaking? On my project, even though Whisper does a good job translating it, I'm not happy with the query results. My theory is that GPT-3 is designed for written word and the way people speak and the way they write are structurally different. Or I'm just figuring this out and I'm not good enough at it yet.
It’s often not enough to just index the snippets themselves. You may need to augment them. For instance, you may need to keep track of the context, and prepend it to the actual snippet that you want to index.
The important thing in such a pipeline is not GPT 3. The important thing is the retrieving/ranking algorithm that finds the most relevant snippets and feeds them into GPT 3. The latter is only the mouthpiece, if you will.
In fact, you might even find that you’re better off without it (no confabulation, ground truth data).
I've got tons of notes so it shouldn't be too hard to do a write up. Currently it's in a private repo, but if I can get sign-off from my boss I'll open source it.
For us this is super useful because it's not unusual for our discover sessions to last days and we're all terrible at taking notes.
As a nerd, my brain is already buzzing on ways that I could use this for my groups D&D campaigns.