Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't agree people are confused (I wasn't) or that they are depending on prior experiences (many of these points aren't rooted in direct experiences at all!). OpenAI is choosing to brand this as a fine tuning of a model that is a minor version of GPT 3.X, so it's a pretty natural shorthand.

Agree with you directionally on the research assistant point, although I think it would be interesting to define that task with more detail to see the comparisons. I'd expect that most research workflows starting with ChatGPT still need to end in search to confirm and contextualize the important parts.



Between the release of GPT-3 and GPT-3.5 there was Gopher, which raised the bar on TruthfulQA from essentially random (22.6%) in GPT-3's case to 45% for Gopher. GopherCite then brought the performance up to 80-90%. One has to assume that OpenAI is using state of the art techniques in their new model releases. That the LLMs went from choosing answers randomly to producing accurate results on a great deal of questions (they still suck at math) is missed for anyone who is not aware of the historical context that shorthanding 3.5 to 3 causes.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: