Us programmers like to distill everything down to rigid sets of rules because that's how our mind operates. The fewer probabilistic "analog" parameters, the better. Of course the real world doesn't work this way.
It is by no mean specific to programmers. Ask to someone who learns French, for instance. Rules with too many arbitrary exceptions.
What is specific to programmers is that their tool performs at its best with simpler rules, so their job is to find the necessary and sufficient set of rules - and will dismiss most of the cases pointed by this article as unimportant exceptions the software won't handle.
> Ask to someone who learns French, for instance. Rules with too many arbitrary exceptions.
I took French in middle school, and it was always a running joke that the teacher spent the first 5 minutes on the rule, and the next 40 minutes on the exceptions.
I am French and my non-native-French wife often asks me "why do you say this and that".
Either there is a simple rule and well known exceptions we learn at school (she would also know) or we get into the area of "this is what this is, just learn it by heart".
And then suddenly, someday, I discover there is an obscure rule with complicated words that addresses the question.
Natural languages are kinda weird about this because most people don't remember their rules as rules, they learn by example, by finding patterns and kinda extrapolating them.
English is a foreign language to me. But I somehow managed to learn it without learning the rules. I can say things correctly-ish without being able to explain why I used this particular grammar.
In the end the data has to fit into structures or tables that can be processed by some algorithms. If the system is not rigid to a certain degree it would become unmaintainable or full of bugs or both.