The problem is quite subtle, though obvious in retrospect. I've seen a paper from a separate, academic, research group make similar model with the exact same problem.
The problem would, however, have been clear, if the model was compared to simply using the current mean blood pressure (MAP) as a predictor of hypotension, because MAP is the problematic predictor variable. Instead, the model was only compared to short-term changes in MAP (ΔMAP), which is obviously nonsensical and has an AUROC of ~0.55.
Hm, reading the linked tweets the problem seems like a big screaming red target on the side of a white barn, not a feature engineering subtlety. It seems like the typical case of the drunk guy looking for his keys under the streetlight. (Having insufficient data, and comparing the model to an arbitrarily picked one that just happens to be even worse. And then everyone including the FDA patting them on the back.)
I think it is the general incompetence of the "academia + R&D biz + regulation pipeline". (In the land of the blind the one-eyed is king, etc.)
It's sort of inevitable in such a non-teleological process. As in each step in it serves its own purpose, and so the whole thing doesn't really serve the purpose that we like to assume for it - ie. give us great thoughtful inventions. That's why it took so long to stop the Theranos train, that's why it takes so fucking long to roll out polyvalent vaccines (ie. all-in-one vaccines), and so on. (I'm picking on medtech here but there are many others, the Boeing + FAA MCAS fuckup, the absolute limpdick paralysis of nuclear power - it needed a combination of half the world on fire + prelude-to-WWIII to get it moving again, and so on.)