Post-accident attribution to a 'root cause' is fundamentally wrong. - Richard I. Cook, MD. 'How Complex Systems Fail'.
In the complex world, the notion of 'cause' itself is suspect; it is either nearly impossible to detect or not really defined - another reason to ignore newspapers, with their constant supply of causes for things. - Nassim Taleb
I think the correct way to view this is to try to find the points of maximum leverage. The root cause is always the Big Bang. We already know that, so it's kind of pointless to look for it. What you're interested in is what can be changed.
These will be different depending on your perspective/level of power. For a SWE, it will be something like, "If we had monitoring of this variable, we would have gotten notified faster." Or, "If we had this type of testing, we'd prevent this class of errors." For a VP, it will look more like, "If our incentive structure were changed this way, or if our dev infra team built this kind of tooling, my SWEs would be better enabled to prevent this outage." Often there will be more than one point of leverage, but you only have so much time in your life, so the question is which one is most important or likely to be effective.
Examples of points of non-leverage are, "Joe submitted this busted CL." That is in the past and is not changeable. If Joe is in a habit of submitting busted CLs, he might need counseling, a PIP, or failing all else, to be fired. However, typically the submission of broken CLs is a systemic problem, not a case of a single sub-par engineer. You have to find the leverage that will allow you to change the system.
Agreed, after identifying contributing factors or manageable interfaces process-oriented fixes are most powerful. That's why CI/CD is so awesome, RCS/VCS pre-commit and post-commit hooks allow you to automate the checklists you need to solve entire categories of issues.
It should be called casualty chain analysis. Even though newspapers are awful, and proper RCAs are hard, Taleb likes to diss on anything and everything not invented by himself, without offering much.
It's true that in a complex world, there isn't going to be a single cause for anything significant; there will be multiple causes and there can be complex relationships between them. The article recognizes that very point:
"The issue with root cause analysis is that it can lead to oversimplification and it is rare for there to be one single root cause."
But that is not at all the same as the notion of cause itself being suspect. Taleb on that point is simply wrong.
This misattribution is something America loves to fall for. And we react in _gigantic_ ways against the wrong thing. Communism, Unions, Vietnam, Saudi Arabia, Iran, Central America, Iraq, the environment. We are great at solving the last big problem in the most headon disastrous way. We only react, never predict.
We treated Mexico like sht, if we helped build her up, we both would be a solid position right now. Our war on drugs directly fueled a bloody civil war that has and is displacing millions. The war on drugs was the problem, it raised the price of drugs that made something both attractive and illegal, perfect combo. It is like a cryptocurrency called DistableCoin, where you can mine for more by attacking the state.
The root cause is a direct physical cause, but the problem space is the problem. We need to go up-dimension, to project the problem into lots of different lower dimensions. Lots of cut off roasts in the world. The next level of meta is to ask why people cook their own food. Then one asks why the have to eat at all? Why are they biological? Why are they physical?
In the complex world, the notion of 'cause' itself is suspect; it is either nearly impossible to detect or not really defined - another reason to ignore newspapers, with their constant supply of causes for things. - Nassim Taleb
... via https://github.com/globalcitizen/taoup