In the world of science you get to compare software's predictions against reality. It's a weird concept but it grows on you.
I've seen systems with this structure. Part of the fun is B's code is likely to have a lot of errors in it that cancel each other out in exciting ways when running on the domain of interest, which makes using it to work out why A's code is failing to correspond to reality much harder than it could be.
I remember around 2012 Firefox did use to have tab grouping, and they dropped the functionality after. I remember a friend of mine being annoyed for that, although I myself didn't appreciate much the functionality.
Oh, it's better than that: They had tab grouping as a native feature, and it was great, then they factored it out into an extension... and then they changed the way extensions worked, and didn't bother porting it. Kind of an insulting way to kill a feature IMO.
Are we sure this whole discussion can't be reduced to just links to xkcd strips?