Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I agree with that. Classically used "AI benchmarks" need to be questioned. In my field, these guys have dropped a bomb, and no one seem to care: https://hal.science/hal-04715638/document


Can you give brief summary why this paper is a breakthrough for an outsider of the field?


Checking it shortly (I haven't seen the paper before) this seems to be a very good analysis of how results are reported specifically for medical imaging benchmarks.

As is often the case with statistics, selecting just a single number to report (whatever that number is) will hide a lot of different behaviours. Here, they show that just using the mean is a bad way to report data as the confidence intervals (reconstructed by the methods in the paper in most cases) show that the models can't really be distinguished based on their mean.


Hell, I was asked to use confidence interval as well as average values for by bs thesis when doing ml benchmarks and scientist publishing results in medical fields aren't doing it...

How can something like that happen? I mean, i had a supervisor tell me "add the confidence interval to the results as well", and explained me why. I guess that at nobody ever told them? Or they didn't care? Or it's just a honest mistake


Is it because it’s word-of-mouth and not written down in some NSF (or other organization) guidance? Thiss seems to be the issue


That might be, but couldn't a paper be asked to include that to be published? It looks like an important information


I don't think it qualifies as a breakthrough. In short:

1. Segmentation is a very classical in medical image processing. 2. Everyday there are papers claiming that they beat the state of the art 3. This paper says that most of the time, the state of the art has not been beat because they actually are in the margin of error.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: