On the one hand, often there can be shared lines of code without a shared idea, this shouldn't be a candidate for being factored out into some new abstraction.
On the other hand, you may want to introduce an abstraction and factor out code into a common library / dependency / framework when there's a shared well-defined concern/responsibility.
That said, on the gripping hand, I say may because even if there's the opportunity to introduce a clean future-proof abstraction, introducing it may be at the cost of coupling two things that were not previously coupled. If you've got very good automated test coverage and CI for the new common dependency and its impact upon the places it is consumed, then perhaps this is fine. If the new common dependency is consumed by different projects with different pressures for change / different rates of change / different levels of software quality then factoring out a common idea that then introduces coupling may cause more harm than good.
I am continually fighting the DRY cult at work. I explain that we need to focus on whether the code shares requirements or implementation. With one, the duplicated code would evolve hand-in-hand while the other can have the code evolve separately. The problem is it requires a tipping point before you split the code up again. So instead you extend the existing code to handle disparate requirements, turning it into a god function / object.
To help weed out requirements I tell people that on their third copy / paste, they might begin to consider reducing the duplication. At that point, they both have had time to think about the code and had gained experience with it to discover what the requirements are.
Another problem with bad code reuse is code locality. Like with Instruction and memory locality helping to improve runtime performance via caching, code locality helps improve mental processing. The further you separate related pieces of code, the more you need to have a good abstraction for it so you can correctly reason about the code. Without a good abstraction, you are having to jump between far areas of code to figure out what your function does.
Programmers have had DRY drummed into them so hard that it is almost heretical to even consider the tradeoff of increased coupling that arises from it. Coupling is good if things should change together because they are linked in some fundamental way such that it would be wrong for them to be different. Coupling is bad when things should evolve independently and the shared link is incidental.
The problem is that it is surprisingly hard to tell the difference up front. In the moment of writing the code, the evidence for the shared abstraction seems overwhelming, but the evidence of the cost of coupling is completely absent. It exists only in a hypothetical future. Unless there is strong evidence for a shared underlying conceptual link, I often consider only the 3rd repetition of a shared piece of code evidence for the existence of a true abstraction. Two times could just be chance, three is unlikely to be so.
> In the moment of writing the code, the evidence for the shared abstraction seems overwhelming, but the evidence of the cost of coupling is completely absent
This is actually representative of a problem in the industry as a whole I think. A lot of things have short term benefits but long term drawbacks. Because of the drastic, recent growth, orgs are bottom heavy (very few people have experienced long term drawbacks of X compared to how many people who just learnt X). Additionally, because of the extremely quick turnover of people, it's even rare that people who implement X are there when X blows up in people's face. They went on to implement Y...and will be gone before Y blows up.
So most tools, libraries, frameworks and abstractions are HEAVILY optimized for the short term. Optimized for getting a project set up quick. Optimized for the initial "Hello world". Optimized to get an API and a form in seconds. Very few tools/patterns are optimized for ease of long term (hell...these days long term means a year or two) maintenance. The ones that are generally get a bad rep.
And building stuff that's both good short AND long term is very, very hard.
Having worked with some Go programs, where one of the sayings is:
A little copying is better than a little dependency.
This is even in the case with the two copies share the same correct abstraction. It's different than in the npm world. I've taken this and applied it in microservices written with other languages/frameworks and have no regrets. Sometimes some versions are a bit less complete or featureful, but each works fine. If a bug is discovered, it's fairly easy to and patch them all.
It's usually the intermediate developers who like having rules to follow to know that they're doing well that tend to over-apply DRY and other principles. Only experience (aka pain over time) seems to show when to break (or just not apply) the rules.
Perhaps it's just the way things are taught/learned. Instead of just showing what's good and have them interpreted as rules, each should be shown as a rule of thumb with a concrete example of when it should not be applied. Even if they don't clearly understand the difference at the time, they'll always recall that there are exceptions and not feel so motivated to apply it in every instance.
It is not just that 3 indicates the existence of an abstraction, but seeing 3 examples improves your odds of identifying the correct abstraction to use.
This comment is more insightful than the original post.
The real problem is when engineers abhor duplication, and in order to reuse existing code, they simply call the same code from multiple places without thinking through the basis for this reuse.
Mindless deduplication is not an act of abstraction at all! This is a very important point, because a "wrong" abstraction that is conceptually sound is not that hard to evolve, and if the code is called from N places then you get to look at those places to understand how to evolve the abstraction. Improvements to one part of the code benefit N parts, and you save work.
The only other factor to keep in mind is the dependency graph and coupling, as my parent mentions.
Mindless deduplication is more common than you'd think, especially with bits of code like utility functions and React components. For example, you end up with a component called Button that has no defined scope of responsibility; it's just used in one way or another in 17 different places that render something that looks or acts sort of like a button. This is not the "wrong abstraction," it is code reuse without abstraction.
I know what you mean, but you need to find a different or more nuanced term. Deduplication is abstraction, it just isn't an abstraction mapped to the domain problem. Even a compression algorithm abstracts:
An abstraction can be seen as a compression process, mapping multiple different pieces of constituent data to a single piece of abstract data; [1]
Conceptual or semantic compression, yes, as the rest of that section makes clear. The very problem with deduplication without abstraction is not thinking at the conceptual level, only at the literal code level. There are lots of ways to compress code, e.g. minifying it :)
Quoting the start of the article: Abstraction in its main sense is a conceptual process where general rules and concepts are derived from the usage and classification of specific examples, literal ("real" or "concrete") signifiers, first principles, or other methods.
For the Button case, you'd have to come up with some concept of what a Button is and does, beyond what code lives in its file (e.g. an onClick handler that calls handleClick, etc.) in order to have an abstraction.
There are "wrong" abstractions (in the sense of abstractions that turn out to need to be changed later, like any code), but if you lump all deduplication into abstraction then you will have a skewed sense of the cost of changing an abstraction.
The cost of changing an abstraction also depends on your programming language; if you spend a lot of time in a dynamically-typed language, you may internalize that refactoring is tedious and error-prone and often intractable.
I got some flack from some other students learning when I had some duplicated code in a few places. I tried to explain that while they shared some common code the way some of the code that used it worked really didn't have the same goal and/or might change as opposed to other code. So while I was sharing some places, I chose not to share in others to allow each area some level of independence if / when we change that. They were all zombie "Barrrrrrgh look at me reusing code all efficient like!"
Granted we were all n00bs and nobody will see that code again so it wasn't a big deal... but the intent, direction, and possible future of the code seems like something that should be considered once you start sharing.
> but the intent, direction, and possible future of the code seems like something that should be considered once you start sharing.
Yes. Dare I say, intent is one of the most important things here. Two new pieces of code may be structurally identical, and yet represent completely different concepts. Such code should not be DRYed. Not just because there's a high chance you'll have to split the abstraction out later on - but also because any time two operations use a common piece of code, there's an implicit meaning conveyed here, that the common piece of code represents a concept that's shared by the two operations.
People often forget "copy-on-write". Coupling doesn't have to be permanent. If refactor to create a sahred component, and then you want to modify a shared component to help one client, you can fork it -- it's not worse than simply not having created the shared component in the first place.
In my experience people will most likely just hack the shared component by adding awful arbitrary if-statements or other such hacks, rather than fork the shared component. This is the path of least resistance. Once this happens a few times that shared component begins to be seen as a central component and is quite a complicated mess.
But often the fork happens too late, after the first few differences have been creatively shoehorned into the shared code. The resulting mess then tends to live on twice after the fork.
In the end, almost every conceptual way to slice up software can be viable if you are good at whatever you do, and terrible of not.
My jam is to wait for a few use cases before creating a new abstraction or process. I want to see how they are similar, and how they are different too in order to form a generalized solution that will serve the use cases at once. Dealing more and more with tooling for other developers, this applies especially to the tooling APIs.
I apply this to DRY, coupling, encapsulation, APIs, etc. Also, I prefer to focus on consistency and readability over most other concerns. I mentally, or physically!, note areas of code I want to improve but don't feel like the time is right now, right now. During future work that touches that code I will refactor it if a solution has presented itself.
These days I prefer languages with bomber language services and tooling to make refactor in the future as painless as possible(types). I prefer explicit over implicit(sorry Ruby and Chef), and configuration over convention(looking at you Gradle).
> when there's a shared well-defined concern/responsibility
I think a good test for this is if you can write a reasonable unit test for the code in question. If your unit tests essentially become two separate sets of tests, testing the different branches of code, it's probably the wrong abstraction. If your tests work and you've built a reasonable standalone library (even if it's not useful to anything but your exact product), that's at least a signal the abstraction is sustainable.
Now we simply need a formal objective definition of "reasonable code" and the industry should never have this problem again.
Even if some shared code is currently branch-free though, it may be unlikely to remain that way if the abstraction is fragile.
A red flag is vague function names like "processThing" or "afterThingHappens". If a function can't be summarized concisely, it's probably doing too many things, and the abstraction is likely to break down later when the callers' needs diverge.
As a senior engineer who recently became an engineering manager I always caution my devs about abstracting too liberally. Junior engineers are particularly bad about this. They see a handful of functions that are duplicated across a few (unrelated) projects and they want to create a new repo/library and share it. Then I direct them to the Slack channel for our platform services, which has a sizable shared library across dozens of services. That shared library is a frequent source of problems.
It takes a while, but I usually beat that primal impulse out of them.
> They see a handful of functions that are duplicated across a few (unrelated) projects and they want to create a new repo/library and share it
He is what I think you do when you do that.
You just created yet another internal API. Designing, creating and _documenting_ good API's is _hard_. The most likely result is an undocumented dodgy half finished API that doesn't fully encapsulate the thing it's supposed to deal with. So you end up with code that both uses and bypasses the API you just wrote.
If you do that and later decide that you want to move some functionality from one side of the API to the other you've just set yourself up for a hella lot of work.
The other thing is, you want to make changes to duplicated code. You can limit the risk to code base you're actually working on and not a bunch of unrelated programs.
> If the new common dependency is consumed by different projects with different pressures for change / different rates of change / different levels of software quality then factoring out a common idea that then introduces coupling may cause more harm than good.
This reminds me of the recent post on HN about a company migrating from microservices back to a monolith, for this exact reason.
I guess what I have run into is a lot of code that is agressively, needlessly abstracted for a future that will likely never come. Abstractions that perhaps would be worth it to save hundreds of copies or permutations, while I'm looking at one.
I'm all for not repeating myself, but there is a different between "usually avoid" and "never"
Copying and pasting in many situations would seem a breeze compared to the nest of abstractions required to avoid it.
Seeing the future? Picking the right macro strategy, programming language, database, etc ahead of time, sure, an experienced developer can usually do that. But correctly predicting the boundaries of systems, libraries, APIs—and guessing who will maintain which system and figuring out dependencies—that never goes according to plan. So in my experience YAGNI and reducing coupling are more important principles than sharing code.
If I'm writing an API to move a robot, my problem space is fairly bounded, and I know that someday I will want force control at some end effector. I know that there's a 6 axis robot I've been eyeing, etc.
On the one hand, often there can be shared lines of code without a shared idea, this shouldn't be a candidate for being factored out into some new abstraction.
On the other hand, you may want to introduce an abstraction and factor out code into a common library / dependency / framework when there's a shared well-defined concern/responsibility.
That said, on the gripping hand, I say may because even if there's the opportunity to introduce a clean future-proof abstraction, introducing it may be at the cost of coupling two things that were not previously coupled. If you've got very good automated test coverage and CI for the new common dependency and its impact upon the places it is consumed, then perhaps this is fine. If the new common dependency is consumed by different projects with different pressures for change / different rates of change / different levels of software quality then factoring out a common idea that then introduces coupling may cause more harm than good.