Fig. 8, where the model becomes poorly calibrated in terms of text prediction (Answers are "flattened" so that many answers appear equally probable, but below the best answer)
In many subfields, the submitter isn't even attempted to be hidden from the reviewers. Usually, even the reviewers can be guessed with high accuracy by the submitters
The applications that are WL-units suitable are much more than the WL-units unsuitable ones you refer to.
It is true that I (and others) avoid using Wolfram Language (WL) units. Or try to get the computation expressions units-free as quickly as possible. But that is also a normal procedure when crafting computational workflows.
Also worth noting that test pilots of experimental aircraft generally wear parachutes, at least for higher risk tests. This includes tests of commercial aircraft.