I am not a AI booster at all, but the fact that negative results are not publish...

moravak1984 · 2025-05-20T06:54:17 1747724057

Sure, it's not. But often on AI papers one sees remarks that actually mean: "...and if you throw in one zillion GPUs and make them run until the end of time you get {magic_benchmark}". Or "if you evaluate this very smart algo in our super-secret, real-life dataset that we claim is available on request, but we'd ghost you if you dare to ask, then you will see this chart that shows how smart we are".

Sure, it is often flag-planting, but when these papers come from big corps, you cannot "just ignore them and keep on" even when there are obvious flaws/issues.

It's a race over resources, as a (former) researcher on a low-budget university, we just cannot compete. We are coerced to believe whatever figure is passed on in the literature as "benchmark", without possibility of replication.

aleph_minus_one · 2025-05-20T13:28:31 1747747711

> It's a race over resources, as a (former) researcher on a low-budget university, we just cannot compete. We are coerced to believe whatever figure is passed on in the literature as "benchmark", without possibility of replication.

The central purpose of university research has basically always been that researchers work on hard, foundational topics that are more long-term so that industry is hardly willing to do them. On the other hand, these topics are very important, that is why the respective country is willing to finance this foundational research.

Thus, if you are at a university, once your research topic becomes an arms race with industry, you simply work either at the wrong place (university instead of industry) or on a "wrong" topic in the respective research area (look for some much more long-term, experimental topics that, if you are right, might change the whole research area in, say, 15 years, instead of some high resource-intensive, minor improvements to existing models).

nicoco · 2025-05-20T07:13:38 1747725218

I agree with that. Classically used "AI benchmarks" need to be questioned. In my field, these guys have dropped a bomb, and no one seem to care: https://hal.science/hal-04715638/document

baxtr · 2025-05-20T07:18:14 1747725494

Can you give brief summary why this paper is a breakthrough for an outsider of the field?

mzl · 2025-05-20T07:31:20 1747726280

Checking it shortly (I haven't seen the paper before) this seems to be a very good analysis of how results are reported specifically for medical imaging benchmarks.

As is often the case with statistics, selecting just a single number to report (whatever that number is) will hide a lot of different behaviours. Here, they show that just using the mean is a bad way to report data as the confidence intervals (reconstructed by the methods in the paper in most cases) show that the models can't really be distinguished based on their mean.

amarcheschi · 2025-05-20T08:44:03 1747730643

Hell, I was asked to use confidence interval as well as average values for by bs thesis when doing ml benchmarks and scientist publishing results in medical fields aren't doing it...

How can something like that happen? I mean, i had a supervisor tell me "add the confidence interval to the results as well", and explained me why. I guess that at nobody ever told them? Or they didn't care? Or it's just a honest mistake

stogot · 2025-05-20T18:01:16 1747764076

Is it because it’s word-of-mouth and not written down in some NSF (or other organization) guidance? Thiss seems to be the issue

amarcheschi · 2025-05-20T21:23:28 1747776208

That might be, but couldn't a paper be asked to include that to be published? It looks like an important information

nicoco · 2025-05-20T11:00:35 1747738835

I don't think it qualifies as a breakthrough. In short:

1. Segmentation is a very classical in medical image processing. 2. Everyday there are papers claiming that they beat the state of the art 3. This paper says that most of the time, the state of the art has not been beat because they actually are in the margin of error.

asoneth · 2025-05-20T13:58:49 1747749529

I published my first papers a little over fifteen years ago on practical applications for AI before switching domains. Recently I've been sucked back in.

I agree it's a problem across all of science, but AI seems to attract more than it's fair share of researchers seeking fame and fortune. Exaggerated claims and cherry-picking data seem even more extreme in my limited experience, and even responsible researchers end up exaggerating a bit to try and compete.

KurSix · 2025-05-20T09:00:47 1747731647

AI just happens to be the current hype magnet, so the cracks show more clearly

croes · 2025-05-20T06:16:06 1747721766

But AI makes it easier to write convincing looking papers