A lot of people in RAG already do this. I do this with my product: we process each page and create lists of potential questions that the page would answer, and then embed that.
We also embed the actual text, though, because I found that only doing the questions resulted in inferior performance.
So in this case, what your workflow might look like is:
1. Get text from page/section/chunk
2. Generate possible questions related to the page/section/chunk
3. Generate an embedding using { each possible question + page/section/chunk }
4. Incoming question targets the embedding and matches against { question + source }
Is this roughly it? How many questions do you generate? Do you save a separate embedding for each question? Or just stuff all of the questions back with the page/section/chunk?
Right now I just throw the different questions together in a single embedding for a given chunk, with the idea that there’s enough dimensionality to capture them all. But I haven’t tested embedding each question, matching on that vector, and then returning the corresponding chunk. That seems like it’d be worth testing out.
We also embed the actual text, though, because I found that only doing the questions resulted in inferior performance.