> Remember, LLMs are just statistical sentence completion machines. So telling it what to respond with will increase the likelihood of that happening, even if there are other options that are viable.
Obviously. When I say "tuned" I don't mean adding stuff to a prompt. I mean tuning in the way models are also tuned to be more or less professional, tuned to defer certain tasks to other models (i.e. counting or math, something statistical models are almost unable to do) and so on.
I am almost certain that the chain of models we use on chatgpt.com are "tuned" to always give an answer, and not to answer with "I am just a model, I don't have information on this". Early models and early toolchains did this far more often, but today they are quite probably tuned to "always be of service".
"Quite probably" because I have no proof, other than that it will gladly hallucinate, invent urls and references, etc. And knowing that all the GPT competitors are battling for users, so their products quite certainly tuned to help in this battle - e.g. appear to be helpful and all-knowing, rather than factual correct and therefore often admittedly ignorant.
Whether you train the model how to do math internally or tell it to call an external model which only does math the root problem still exists. It's not as if a model which only does math won't hallucinate how to solve math problems just because it doesn't know about history, for the same number of parameters it's probably better to not have to duplicate the parts needed to understand the basis of things multiple times.
The root problem is training models to be uncertain of their answers results in lower benchmarks in every area except hallucinations. It's like you were in a multiple choice test and instead of picking which of answers A-D you think made more sense you picked E "I don't know". Helpful for the test grader, a bad bet for the model trying to claim it gets the most answers right compared to other models.
The technical solution is the easy half, the hard part is convincing people this is how we should be testing everything because we care about knowing the uncertainty in any test.
E.g. look at the math section of the SATs, it rewards trying to see if you can guess the right answer instead of rewarding admitting you don't know. It's not because the people writing the SATs can't figure out how to grade it otherwise, it's just not what people seem to care most about finding out for one reason or another.
Obviously. When I say "tuned" I don't mean adding stuff to a prompt. I mean tuning in the way models are also tuned to be more or less professional, tuned to defer certain tasks to other models (i.e. counting or math, something statistical models are almost unable to do) and so on.
I am almost certain that the chain of models we use on chatgpt.com are "tuned" to always give an answer, and not to answer with "I am just a model, I don't have information on this". Early models and early toolchains did this far more often, but today they are quite probably tuned to "always be of service".
"Quite probably" because I have no proof, other than that it will gladly hallucinate, invent urls and references, etc. And knowing that all the GPT competitors are battling for users, so their products quite certainly tuned to help in this battle - e.g. appear to be helpful and all-knowing, rather than factual correct and therefore often admittedly ignorant.