I think RAG could be used that do that. If you have a one time retrieval in the beginning, basically amending the prompt, then I agree with you. But there are projects (classmate doing his masters thesis project as one implementation of this) that retrieves once every few tokens and make the retrieved information available to the generation somehow. That would not take a toll on the context window.