Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

1 million might sound like a lot, but it's only a few megabytes. I would want RAG, somehow, to be able to process gigabytes or terabytes of material in a streaming fashion.


RAG will not change how many tokens LLM can produce at once.

Longer context on the other hand, could put some RAG use cases to sleep, if your instructions are like, literally a manual long, then there is no need for rag.


I think RAG could be used that do that. If you have a one time retrieval in the beginning, basically amending the prompt, then I agree with you. But there are projects (classmate doing his masters thesis project as one implementation of this) that retrieves once every few tokens and make the retrieved information available to the generation somehow. That would not take a toll on the context window.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: