Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes, typically users send the newest user message and the full conversation history. These combined become the prompt.

Our API endpoint will try to route requests that has the same prefix to the same vLLM instance (similar to longest prefix matching in networking), and hopefully there are still some KV caches for part of the prompt there.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: