> Confidence calibration: When your agent says it's 60% confident, it should be ...

fumeux_fume · 2025-09-04T20:41:23 1757018483

The author's inner PM comes out here and makes some wild claims. Calibration is something we can do with traditional, classification models, but not with most off-the-shelf LLMs. Even if you devised a way to determine if the LLM's confidence claim matched it's actual performance, you wouldn't be able to calibrate or tune it like you would a more traditional model.

esafak · 2025-09-04T18:30:51 1757010651

I was about to say "Using calibrated models", then I found this interesting paper:

Calibrated Language Models Must Hallucinate

https://arxiv.org/abs/2311.14648

https://www.youtube.com/watch?v=cnoOjE_Xj5g