Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Confidence calibration: When your agent says it's 60% confident, it should be right about 60% of the time. Not 90%, not 30%. Actual 60%.

With current technology (LLM), how can an agent ever be sure about its confidence?



The author's inner PM comes out here and makes some wild claims. Calibration is something we can do with traditional, classification models, but not with most off-the-shelf LLMs. Even if you devised a way to determine if the LLM's confidence claim matched it's actual performance, you wouldn't be able to calibrate or tune it like you would a more traditional model.


I was about to say "Using calibrated models", then I found this interesting paper:

Calibrated Language Models Must Hallucinate

https://arxiv.org/abs/2311.14648

https://www.youtube.com/watch?v=cnoOjE_Xj5g




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: