They weren't, if you look at the fix [1] the dedupe loop was run in all cases, not just those with known dupes, so the performance hit was any bundle with lots of refs.
But why couldn't they just dedupe the refs from the command line before starting the actual bundling - surely there are never more than a couple of hundred of those (one per branch)?