Hacker News new | past | comments | ask | show | jobs | submit login

My understanding is that the implementation of modern hosted LLMs is nondeterministic even with known seed because the generated results are sensitive to a number of other factors including, but not limited to, other prompts running in the same batch.





Gemini, for example, launched implicit caching on or about 2025-05-08: https://developers.googleblog.com/en/gemini-2-5-models-now-s... :

> Now, when you send a request to one of the Gemini 2.5 models, if the request shares a common prefix as one of previous requests, then it’s eligible for a cache hit. We will dynamically pass cost savings back to you, providing the same 75% token discount.

> In order to increase the chance that your request contains a cache hit, you should keep the content at the beginning of the request the same and add things like a user's question or other additional context that might change from request to request at the end of the prompt.

From https://news.ycombinator.com/item?id=43939774 re: same:

> Does this make it appear that the LLM's responses converge on one answer when actually it's just caching?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: