Context length, context length, context length. LLM architectures (there are rar...

Context length, context length, context length. LLM architectures (there are rare exceptions) inherently get worse at answering problems given too much context. From a user point of view, these appear as just idiosyncrasies to each model's ability to stay coherent - you just have to hope the devs are cognizant of this behavior in distributions of data close to yours.

Keep the input minimal. Keep a set of gold standard tests running in a loop to catch problems quick. Don't tune out. Debate whether you really need to use that new model you haven't worked with that much yet just because it's newer. And double check you aren't being sold e.g. a quantized version of the model.