Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Good Software Development Habits (zarar.dev)
364 points by mmphosis 11 months ago | hide | past | favorite | 190 comments


> It's better to have some wonky parameterization than it is to have multiple implementations of nearly the same thing. Improving the parameters will be easier than to consolidate four different implementations if this situation comes up again.

Hard disagree. If you cant decompose to avoid "wonky parameters" then keep them separate. Big smell is boolean flags (avoid altogether when you can) and more than one enum parameter.

IME "heavy" function signatures are always making things harder to maintain.


Hugely agree. Every junior on my team has heard me say: "copy-paste is free; abstractions are expensive." When you move two bits of logic behind a common interface you tell the world that they're the same type of thing, and future editors will tend to maintain that promise - if the two things diverge further, someone will handle that by adding more parameters to the shared interface.

So when deciding whether to merge two similar functions, to me the question to ask yourself is "are future changes to one of these functions almost certain to affect the other one as well?" If not, just leave the functions separate no matter how similar they are.


I’m only a few years in the industry, and in my CS program, we were constantly told something along the lines of “any time you have to copy paste, look for an opportunity to abstract”. I’ve been running into problems lately where my attempts at abstractions have made things significantly more complicated. Only when I hit the limits of the abstraction I realize the cost of maintaining similar functionality in multiple places was less. I’m going to try your approach in future.


I think the reasoning for DRY was kind of lost in translation.

“any time you have to copy paste, look for an opportunity to abstract” assumes that having an abstraction is always better, but I don't think that is the case.

In my opinion the reasoning as to why "code duplication is a code smell" is that if you have to copy and paste code around you are probably missing an useful abstraction for your code. And I think "useful" is the most important thing to keep in mind.

Sure, every time I copy and paste code I know that exist an abstraction I could create to eliminate this duplication. Generally this is pretty easy. The hard part is to understand when this new abstraction will help you to deliver the features the business need.


Surely there is a parallel with standardized testing asking the most needlessly ornate prose of its students and then most writing having more value the plainer it is written.


Surely there is a parallel with standardized testing asking the most needlessly ornate prose of its students and then most writing having more value the pialner it is written.


Absolutely, its always easy to detangle the mess of inexperienced programmers who copied things everywhere, the nightmare are the medium level programmers who puts everything behind big interfaces and just adds more interfaces with every change.


Indeed, this was me. Now I don’t care if I have three functions doing the same thing slightly differently.

Much better than having some advanced mega functions I don’t understand how it’s working anyway


+1, have 2 implementations that each have an independent branch point? if you combine them you have a function with 2 bool parameters, and 4 possible states to test, 2 of which you might never need


A very common one is two booleans with one combination of them being an invalid state (e.g. never are both bools true in a valid state but all can be false or a mixture). Use an enum instead that represents only the three valid cases.


It’s difficult to convince people that once you consider the testing pyramid, it’s not just 2 + 2 + 2 < 2 x 2 x 2 but also 2 + 2 < 2 x 2


"The greatest shortcoming of the human race is our inability to understand the exponential function”.

https://en.wikipedia.org/wiki/Albert_Allen_Bartlett


Combinatorial explosion of states is a nightmare, IME it means that the abstraction behind is not the right one.

You really don't want to have a function that branches a lot inside. It's very difficult to test.

When you think of adding a flag, run in your head 2^n, this will give you the least number of tests needed. Do you really want to write all of them?


There’s ways to write this that still keep the entrypoint to a single function. Having different function names as your parameters doesn’t make them any less so.


I think it's especially bad advice with the "copy paste once is okay". You absolutely do not want multiple (even just two) copies of what's meant to be exactly the same functionality, since now they can accidentally evolve separately. But coupling together things that only happen to be mostly similar even at the expense of complicating their implementation and interface just makes things harder to reason about and work with.


> I think it's especially bad advice with the "copy paste once is okay". You absolutely do not want multiple (even just two) copies of what's meant to be exactly the same functionality, since now they can accidentally evolve separately.

Hard disagree. Your type of misconception is the root cause of most broken and unmaintainable projects, and the root of most technical debt and accidental complexity.

People who follow that simplistic logic of "code can accidentally evolve separately" are completely oblivious to the fact that there is seemingly duplicate code which is only incidentally duplicate, but at its core should clearly be and remain completely decoupled.

More to the point, refactoring two member functions that are mostly the same is far simpler than refactoring N classes and interfaces registered in dependency injection systems required to DRY up code.

I lost count I had to stop shortsighted junior developers who completely lost track of what they were doing and with a straight face were citing DRY to justify adding three classes and a interface to implement a strategy pattern because by that they would avoid adding a duplicate method. Absurd.

People would far better if instead of mindlessly parrot DRY they looked at what they are doing and understood that premature abstractions cause far more problems than the ones they solve (if any).

Newbie, inexperienced developers write complex code. Experienced, seasoned developers write simple code. Knowing the importance of having duplicate code is a key factor.


What thfuran said was:

> You absolutely do not want multiple (even just two) copies of what's meant to be exactly the same functionality, since now they can accidentally evolve separately. But coupling together things that only happen to be mostly similar even at the expense of complicating their implementation and interface just makes things harder to reason about and work with.

So, if things are fundamentally the same, do not duplicate, but if they are fundamentally different, do not unify. This is absolutely correct.

To which you replied:

> People who follow that simplistic logic of "code can accidentally evolve separately" are completely oblivious to the fact that there is seemingly duplicate code which is only incidentally duplicate, but at its core should clearly be and remain completely decoupled.

Despite the fact that this is exactly what the comment you replied to says.

Then you go on a clearly very deeply felt rant about overcomplication via dependency injection and architecture astronautics and so on. Preach it! But this is also nothing to do with what thfuran wrote.

> Newbie, inexperienced developers write complex code. Experienced, seasoned developers write simple code.

Sounds like the kind of overgeneralisation that overconfident mid-career developers make to me.


To be fair thfuran was hard to decipher and should be refactored to be more clear.


The issue is that you actually never really know is things are fundamentally the same. To know it you have to know the future.


"Know the future" is part of a software engineer's job description, at least insofar as "know" means "make informed predictions about".

Consider the case of making API calls to a third party. You, today, are writing a function that calls the remote API with some credentials, reauthenticates on auth failure, handles backoff when rate limited, and generates structured logs for outgoing calls.

You need to add a second API call. You're not sure whether to copy the existing code or create an abstraction. What do you do?

Well, in this case, you have a crystal ball! This is a common abstraction that can be identified in other code as well as your own. You don't know the future with 100% confidence, but it's your job to be able to make a pretty good guess using partial information.


I think this is what the original post that people took issue with said? By the time you write the same thing for the third time you are not predicting the future any more, you have practical evidence.


But a thing that you wrote the same a few times isn't something that's definitively required to be the same, it's something that happens to be the same right now. You can often clean things up by factoring out that duplication, but needing to add a bunch of parameters to the resulting function is probably a sign that you're trying to combine things that aren't the same and shouldn't be coupled together.

Where I'm saying you absolutely shouldn't copy paste is where there's a business or technical requirement for something to be calculated/processed/displayed exactly a certain way in several contexts. You don't want to let those drift apart accidentally, though you certainly might decouple them later if that requirement changes.


Not the future, but the domain.


or study abstract algebra (but you’re now a researcher, because programming isn’t yet solved)


All walks of developers write overly-complex code because they don’t know how to abstract so they either overdo it, under-do it, or just do it badly.

Writing good abstractions is hard and takes practice. Unfortunately the current zeitgeist has (IMO) swung too hard the wrong way with guiding mantras like “explicitness” which is misinterpreted to mean inline all the logic and expose all the details everywhere all the time and “worse is better” which is misinterpreted to justify straight up bad designs / implementations in the name of not overthinking things, instead of good-but-imperfect ones.

The knee-jerk response against abstraction has led to the majority of even seasoned, experienced developers to write overly complex code because they’ve spent a career failing to learn how to abstract. I’d rather us as an industry figure out what makes a quality abstraction and give guidance to junior developers so they learn how to do so responsibly instead of throwing up our hands and acting like it’s impossible. This despite literally all of computing having been built upon a tower of countless abstractions that let us conveniently forget the fact that we’re actually juggling electrons around on rocks.


> Newbie, inexperienced developers write complex code. Experienced, seasoned developers write simple code

This is a really inaccurate generalization. Maybe you could say something about excess complexity, but all problems have some level of irreducible complexity that code fundamentally had to reflect.


Nope, it is not inaccurate — but you are not wrong either.

Obviously, code will reflect the complexity of the problem.

But incidentally, most problems we solve with code are not that hard, yet most code is extremely complex — a lot more complex than the complexity inherent to the problem. And that's where you can tell an experienced, seasoned (and smart) developer who'd write code that's only complex where it needs to be, from an inexperienced one where code will be complex so it appears "smart".


I think inexperienced developers write complex code because it's difficult to write simple code and they don't know how yet, not because they're trying to make it complex.


> I think inexperienced developers write complex code because it's difficult to write simple code and they don't know how yet, not because they're trying to make it complex.

From what I've been seeing, inexperienced developers write complex code because they are trained with a bias towards accidentally complex code (i.e., how else would you show off design patterns), they have no experience in dealing with the tradeoffs of writing accidentally complex code, and they do not understand the problems they create for themselves and others by adding complexity where they do not need it.

I'd frame accidental complexity in the same class as dead code: inexperienced developers might be oblivious to the risk presented by codd that serves no purpose, but experienced developers know very well the ticking time bomb nature of it.


Yes, I was not trying to imply they do it on purpose, but I can see how it could be read that way.


Don't look at the code I just wrote (populating a user list with avatars, downloaded via background threads). It might cause trauma.

The last couple of days have been annoying, but I got it to work; just not as easily as I wanted. The platform, itself, has limitations, and I needed to find these, by banging into them, and coding around them, which is ugly.


If someone writes a strategy pattern to fix duplication, all power to them, it's a well understood, easy to use pattern that fixes several problems.

> adding three classes and a interface to implement a strategy pattern

Sounds like the language used is the problem here, not the intent. Hasn't Java (et al) made this easier yet?


root cause of dysfunction is executive management, or really customer and market structure (e.g. govt procurement as an extreme example). Full stop

fwiw i agree that copy paste is fine


It's, however, unhelpful to point this out, since developers cannot fix it. We need to find ways to live with this dysfunction.


it is in fact helpful because it reveals that the problem cannot in fact be fixed at the developer layer, and having that knowledge is the first step down a road towards an actual solution rather than endless bike shedding about whether it is okay to copy paste a function body.


Every time you consider copy pasting, you should be asking yourself “if the stuff I’m pasting needs to change, will I want both of these places to change?” It requires some guessing the future, but usually it’s not hard to answer the question.

IME if something should be an independent function or module, I rarely get to the point of considering copy/pasting it in the first place. If I want to copy/paste it’s usually because the two places currently only incidentally need the same code now, and my gut usually tells me that it will no longer be the case if I have to make any sort of change.


Early in my career I started out really DRY, it in my experience and not just the code I wrote led to various issues down the line with unmaintainable edge cases. Especially if many teams are working on those things. It becomes really hard to support at some point. Now I feel much better making things DRY when it is really obvious that it should be.


> I started out really DRY

When you say "DRY" here, would you say you had familiarity with the original definition, or merely what you (quite understandably) inferred from the acronym? Because I think the formulation in The Pragmatic Programmer is pretty spot on in speaking about not repeating "pieces of information", whereas I find in practice most people are reacting to superficial similarity (which may or may not reflect a deeper connection).


Looking at the definition, I do believe I wasn't referring to the original definition. I didn't actually know that original definition was specifically limited to the information/knowledge part. I have to assume there's industry wide misunderstanding on this term?

To avoid the confusion, it seems like DRY would be better named something like "Single source of truth". Because I do agree with that.


> I have to assume there's industry wide misunderstanding on this term?

The "misunderstanding" is at least as prevalent as the original, yes. I wasn't trying to say the original is "correct" - language is determined by usage - just wondering which you were discussing.

> To avoid the confusion, it seems like DRY would be better named something like "Single source of truth".

It could probably do with a better name, but "single source of truth" is usually about the information operated on by the program, rather than information embodied in the program.


You mean it's databases rather than what is in code?

If so, then that's also news to me. I'd have thought that e.g. something like input validation code that can be reused both in backend and client would go under single source of truth. Which I would always prefer not to be repeated, but frequently hard to do unless you have same language in backend and frontend or codegen.


With a sufficiently broad definition of "database", yeah, that's my understanding.


And usually the answer stops becoming a guess at 3. I’ve certainly had enough experiences where we had 2 and 3 in the backlog and no matter how we tried, #3 always required as much or more work than #2 because we guessed wrong and it would have been faster to slam out #2 and let #3 be the expensive one.


My experience is totally different. Sure the popular beginners advice is to never repeat yourself, but in many cases that can actually be a viable operation, especially when you are okay with functions drifting apart or the cases they handle are allowed to differ.

And that happens.

The beginners problem lies in the reasons why that happens — e.g. very often the reason is that someone didn't really think about their argument and return data types, how functions access needed context data, how to return when functions can error in multiple ways etc, so if you find yourself reimplementing the same thing twice because of that — sure thing, you shouldn't — what you should do is go back and think better about how data is supposed to flow.

But if you have a data flow that you are very confident with and you need to do two things that just differ slightly just copy and paste it into two distinct functions, as this is what you want to have in some cases.

Dogmatism gets you only so far in programming.


I think a part of the problem is that in addition to being a well regarded principle with a good pedigree, "DRY" is both catchy and (unlike SOLID or similar) seems self explanatory. The natural interpretation, however, doesn't really match what was written in The Pragmatic Programmer, where it doesn't speak of duplicate code but rather duplicate "pieces of information". If "you are okay with functions drifting apart or the cases they handle are allowed to differ" then the two functions really don't represent the same piece of information, and collapsing them may be better or worse but it is no more DRY by that definition.

I've tried to counter-meme with the joke that collapsing superficially similar code isn't improving it, but compressing it, and that we should refer to such activity as "Huffman coding".

It's also worth noting that the focus on syntax can also miss cases where DRY would recommend a change; if you are saying "there is a button here" in HTML and also in CSS and also in JS, your code isn't DRY even if those three look nothing alike (though whether the steps necessary to collapse those will very much depend on context).


Now this is a principle I can totally get behind. If the same information lives in multiple places in your codebase, you are definitly doing it wrong, unless that same information is just coincidentally the same and used for different purposes in different places


The book assumes that you should know better, that’s the problem. You may understand it correctly and do your best, but remain unsure if that “piece of information” is the same with that one or not, cause it’s open for interpretation.


Uncertainty as to the line between "one piece of information" and "two pieces of information" may be a problem. I don't think it makes sense to say it's "the problem" when most people don't know that DRY is formulated in those terms in the first place.

Personally, I don't think the ambiguity is actually much of a problem; often it's not ambiguous, and when it is it's usually the case that multiple ways of organizing things are reasonably appropriate and other concerns should dominate (they may need to anyway).


I read your second paragraph as vagueness is fine, which sort of makes DRY not a helpful principle but a handwavy problem statement with no clear anything.

As in most vague problems, two extreme solutions (join vs dup) are a wrong way to think about it. I have some ideas on how to turn this into a spectrum in a nearby comment.

I think it is important because DRY-flavored problem is basically the thing you meet in the code most. At least that is my experience, as a guy who hates typing out and rediscovering knowledge from slightly different code blocks or tangled multi-path procedures and refactoring these — either in hope that nothing breaks in multiple places, or that you won’t forget to update that one semi-copy.

I’m programming for a very long time and seemingly no one ever even tried to address this in any sensible way.


I think that it’s our tooling sucks, not us. Cause we only have functions and duplicated code, but there’s no named-common-block idea, which one could insert, edit and

1) see how it differs from the original immediately next time

2) other devs would see that it’s not just code, but a part of a common block, and follow ideas from it

3) changes to the original block would be merge-compatible downwards (and actually pending)

4) can eject code from this hierarchy in case it completely diverges and cannot be maintained as a part of it anymore

Instead we generate this thread over and over again but no one can define “good {structure,design,circumstances}” etc. It’s all at the “feeling” level and doing so or so in the clueless beginning makes it hard to change later.


Without the encapsulation of a function, won’t the code around the common block depend on the details of the block in ways that cause coupling that make the common block hard to change without detailed analysis of all usages.

I like what you are saying, i think, but am stuck on this internal coupling.


It will share nuance with non-hygienic macros, yes. The difference here is that (1) unlike macros which hide what’s going on, the code is always expanded and can be patched locally with the visual indication of an edit, and (2) the changes to the origin block aren’t automatically propagated, you simply see +-patch clutter everywhere, which is actionable but not mandatory.

If you want to patch the origin without cluttering other locations, just move it away from there and put another copy into where it was, and edit.

The key idea is to still have the same copied blocks of code. Code will be there physically repeated at each location. You can erase “block <name> {“ parts from code and nothing will change.

But instead of being lost in the trees these blocks get tagged, so you can track their state and analyze and make decisions in a convenient systemic way. It’s an analysis tool, not a footgun. No change propagates automatically, so coupling problem is not a bigger problem that you would have already with duplicated code approach.

You can even gradually block-ize existing code. See a common snippet again? Wrap it into “block <myname> {…}” and start devtime-tracking it together with similar snippets. Don’t change anything, just take it into real account.


Smalltalk?


Sadly I can’t just go and develop systems in smalltalk eco, too different boots to wear. So there’s no reason to even go and learn about how it does that or a similar thing, cause I not gonna switch or implement it myself in my editor. I’m sure (and confidently so) that I’d like to see exactly the described in editors/ides and that would make my coding life much easier.


That's not entirely true. The difference between intentional and accidental repetition is that the first occurs because the rule is the same in both repetitions, and should be the same ; whereas the second happens to be the same for now. In not repeating yourself in the second case you actually risk changing an operation that should remain the same, as a side effect of changing the common function to alter the behaviour of the first.

That's why DRY is a smell (indicates that something might be wrong) and not a rule.


The problem is, such decisions are taken in the beginning of the project when you are far from full picture. Then comes rest of the app lifecycle - decade(s) of changes, bugfixes, replatformings, data/os/cluster migrations and so on.

I've seen, and even currently work on stuff that has beautiful but hard-to-grok abstractions all over the place (typical result of work of unsupervised brilliant juniors, technical debt in gigatons down the line but its almost always other people's problem). The thing is, that code has seen 10 major projects, absorbed other stuff, meaning and structure of data changed few times, other systems kept evolving etc.

Now all those abstractions are proper hell to navigate and perform any meaningful change. Of course another typical brilliant 5-second-attention-span junior result is complete lack of documentation. So you see stuff happening, but no idea why or why not, what does it mean down the line in other systems, why such choices were made and so on.

These days, I've had enough of any-design-patterns-at-all-costs kool aid and over-engineered cathedrals for rather trivial stuff (I think its mostly down to the anxious ego issue but thats for another discussion), I am more than happy to copy&paste stuff even 20x - if it makes sense at that place. And it does surprisingly often. Yes its very uncool and I won't brag about it on my next job interview, but it keeps things refreshingly and boringly stable and surprisingly also easier to change and test consequences, and somehow that's the priority #1 for most of the companies.


DRY fanaticism is just as bad as not thinking about DRY at all


I write research code, doing that feels very different than web code for example.

In research it is absolutely OK to copy paste a number x of times, because you don't know a priori what will work the way you want.

Usually, I write an algorithm to solve my problem, then I copy paste the function and change it a bit with another idea, and set a switch to choose between them. Then I copy paste another time as the ideas are flowing, and add one more switch.. Etc..

At some point, when I feel that there is too much duplicated code, I abstract the parts of the functions that are similar and never change, so that I can focus only on the changes of ideas, and no more on the mechanic of the methods.

As the code converges toward something I like, I PRUNE the code and remove all not used functions.

But this process can take weeks, and I can go to another issue in the main time, this is because I don't know in advance what is the right thing to do, so I get a code with several parts duplicated, and when I come back to them, I can choose which version I want to use, if something start to feel smelly, I prune it, etc.. Iteratively.

What I wanted to say, is that duplication of code is really dependent on the kind of code I'm doing.

If I'm doing an app, it's way easier to determine which code to keep and wich code to remove and which code to duplicate. But not all fields are the same.

At some period of my life, I always made clean code for research, you loose too many ideas and hidden behind the abstractions, you are not able anymore to work with your code. When you get a new idea, it requires to go through all the abstractions, which is insane in a very rapidly evolving code.


The monstrosities with dozens of flags do not happen because of the first wonky parameter. Inlining a function or refactoring it when the third use case comes around and invalidates assumptions isn't hard.


I mostly agree in practice, but I'd walk both ideas back slightly: Things which should always be the same should have a common name, and things which might differ should have separate names. Doing so gives you a strong foundation where developers making local changes are unlikely to break the global program (related ideas include preferring total functions (reasonable outputs for all inputs allowed by the type system) when possible, constraining type signatures to make that viable if it otherwise isn't, and giving names to things which are harder to misuse when that isn't practical (like `index_of_assume_sorted` instead of `index_of`)).

Connecting that idea back to the discussion:

1. IME, usually when code looks similar there exists a nice abstraction (a nice "name" future people will understand) for the similar bits. Allowing duplication to grow when you could have properly named things will eventually slow down development.

2. Functions with many parameters are rarely that kind of nice abstraction. The commonality is something much more contained, and functions with large parameter counts should usually be relegated to "entrypoints" or other locations where you're actually merging a thousand different concerns.

3. Bad abstractions are much more expensive than duplication. I have zero problems with committing duplicated code when there aren't any obvious solutions and letting a better plan materialize later.


A wonky parametrization is probably sign that you are refactoring at the wrong level. If you have something like

function doStuff(flag: boolean) { // do some stuff if (flag) { // do stuff a } else { // do stuff b } // more stuff }

you may want to do two implementations that are something like

function doStuffA() { doSomething(); doStuffSpecificForA(); doSomethingElse(); }

and

function doStuffB() { doSomething(); doStuffSpecificForB(); doSomethingElse(); }


In those situations, you really have multiple functions intertwined into a single function. Refactor to give each caller its own version of the function, and then refactor so that there isn't copy & paste with the similarities.


The itch that Aspect Oriented Programming was trying to address was that some functionality only needs to differ by what happens in the preamble or the afterward.

And that can be simulated in code you own by splitting the meat of a set of requirements into one or two bodies, and then doing setup, tear down, or a step in the middle differently in different contexts. So now you have a set of similar tasks with a set of subtasks that intersect or are a superset of the other.


These types of lookalike functions are like homonyms: they might be “spelled” the same, but they have different meanings and should not be conflated.


Super rock hard agree with you and disagree with the author

I have seen so many terrible projects with methods with endless arguments/paramters, nested object parameters the signatures are fucking insane

The biggest stench to me in any project is when I see a majority of methods all have > 6 arguments

To quote Shoresy: so dumb


It’s funny, because the biggest stench to me is seeing a project with thousands of nested functions all doing nearly nothing.

Probably one of those ‘truth is in the middle’ kind of situations.


It depends. In fact the entire discussion is wrong, and neither rule has any real world value.

People are all talking about the format of the code, while what defines if it's a good architecture or not is the semantics. Just evaluating that heuristic (yours or the article's) will lead you into writing worse code.


This is really the issue with the article -- it's the CS equivalent of pop-psych feel-good advice like "write a page every day and you'll have a novel before you know it." It doesn't solve your actual problems. It doesn't solve anyone's. You're not actually better off in the long run if every line in your source is a separate commit, unless you have the world's most basic program.

This "it's more important to wrap your code at 80 columns than to understand how the cache hierarchy works" stuff is becoming worryingly endemic. Teamscale has built an entire business around fooling nontechnical managers into believing this shit is not only worthwhile, but should be enforced by tooling, and middle managers at FAANGs, who should know better, are starting to buy in.


Cluttering up git line annotations and code reviews with people's dev envs fighting over where to wrap lines or whether there's a space after parens or whatever is a waste of everyone's time and an impediment to seeing the actual code changes. That's why tooling should enforce a format, not because there's particular importance to the exact enforced format.


Whats wrong with tooling enforcing it?

I mean, where you wrap is not important, and is best left to tooling (brain cycles and meeting time can be used for more important things)


Having it all in one tested function means it’s much easier to keep in line. Woe be the one that decides to change a common section in something coopied all over the codebase.

Modifying those boolean flags within the context of your tests is practically free. Trying to merge 4 files into one is… not.


Have four public api functions, which call a private function underneath to avoid the duplication. Everyone is happy.


Why have we started “hard” disagreeing with each other recently? What’s wrong with just disagreeing?


It indicates importance of the topic and hardness of disgreement.

Tabs vs spaces - people disagree but usually can adapt to the team if needed.

Use java1.4 for green-field web app - hard disagreement for many, looking for new job is more attractive option.


Difference between the two is that hard disagree means you won't be able to change their mind.


Or "should not" change their mind. If I hard disagree, then I should not change my mind, because I see no valid reason, and both my experience and reasoning are solid to the degree I am certain the arguments presented can not develop into a valid reason to change my mind. "Hard disagree" may signify being certain. I then am responsible, for my own sake and wellbeing, of being right in relation to reality, or reality will simply hard disagree.


Can you recommend any refactoring tutorials or books that teach those kinds of lessons?


Not specifically this, per se, but I HIGHLY recommend "A Philosophy of Software Design" by Dr. John Ousterhout

https://web.stanford.edu/~ouster/cgi-bin/book.php


I wish I could upvote this a million times

But, I'll also point out that just like reading about exercise, merely reading the book doesn't help unless one is willing to practice and -- much, much more difficult -- get buy-in from the team. Because software engineering is usually a team sport and if one person is reading these kinds of books and trying to put them into practice, and the other members of the team are happy choosing chaos, it's going to be the outlier who gets voted off the island


Not the GP but I think a foundational skill is naming things. If you can't give a simple name to a function/class/etc., it's probably not well-defined. It should be adjusted to make it easier to name, usually by moving responsibilities out of (or into) the code structure until it represents one concept that you can clearly state as a name.


This! Coming up with meaningfull names helps you undrestand the problem and define the solution. I advise junior devs: if you don't know how to name a variable give it simple 1-letter name: a, b, x, y. When you look at the code it is immediately clear how well they understands the problem. One should be careful to avoid the naming paralasys though.


It depends. Is it truly common functionality that, if improved upon, should apply to all dependent code?

Or is it just getting from point A to point B that happens to be the same in two places right this instant?


KISS > DRY


DRY for the sake of DRY is like not drinking water when you're thirsty.


Yep. Not all code that looks alike is alike.

Similarity can be fleeting.


> Copy-paste is OK once. The second time you're introducing duplication (i.e., three copies), don't. You should have enough data points to create a good enough abstraction. The risk of diverging implementations of the same thing is too high at this point, and consolidation is needed. It's better to have some wonky parameterization than it is to have multiple implementations of nearly the same thing. Improving the parameters will be easier than to consolidate four different implementations if this situation comes up again.

The more I do this software engineering thing the more I feel like this “advice” bites me in the butt. Understanding when you should duplicate code versus when you should consolidate (or if you should just write a TODO saying “determine if this should be split up by [some set in stone timeline]”) is simply just a HARD problem (sometimes at least), and we should treat it as such.

DRY/ WET or whatever shouldn’t be a maxim (let alone a habit! lol), it should at best be a hand-wavey 2-bit dismissal you give an annoyingly persistent junior software dev who you don’t want to actually help!


I see what you mean. DRY and WET and similar ideas are delivered as objective sometimes, but I think it's better to view them as general heuristics, as most rules in software should be.


> Copy-paste is OK once. The second time you're introducing duplication (i.e., three copies), don't. You should have enough data points to create a good enough abstraction. The risk of diverging implementations of the same thing is too high at this point, and consolidation is needed.

This heavily depends on how likely it is for the reasons of change to also apply to the other copies. If the reasons for why the code is the way it is are likely to evolve differently for the different copies, then it’s better to just leave them as copies.

Just being the same code initially is not a sufficient reason to create an abstraction. Don’t focus on the fact that the code is currently the same, instead focus on whether a change in one copy would necessarily prompt the same change in the other copy.

This also applies to pieces of code that are different from the beginning, but are likely to have to change in conjunction, because they rely on shared or mutual assumptions. If possible place those pieces of code next to each other, and maybe add a source comment about the relevant mutual assumptions.

In other words, avoiding code duplication is a non-goal. Keeping code together that needs to evolve together is a goal. Instead of DRY or WET (don’t repeat yourself, write everything twice), think SPOT (single point of truth).


My favorite anti-example is year based tax calculation.

Rules can change enough from year to year so that parameters isn't enough. You will end up with code specific for each year.

You don't want to introduce any chance of changing results for old years when changing common code.

So best to have no common calc code. Each year is fully set in stone.


I don't really agree with that example because of bugs.

The rules for how to calculate taxes for a past year don't change, but you probably didn't implement the previous year's rules perfectly.

If you discover a mistake in how you calculated taxes for a previous year, you should recalculate them so that you can file an amendment.


The only absolute rule that you’ll ever need is that you probably won’t need the abstraction you’re thinking about. To be frank though, it started with putting a function into a new module or class. I think the list is rather bad as a whole. It’s the same as a lot of other “best practices”. It’s vague enough that you can’t really use it, but also so that you can’t really fault it.

Copy pasting code multiple times is never really “fine”. I’d argue that for most things you’d probably be better off writing a duplication script rather than abstracting it into some over complicated nonsense. It’s much easier to change, and delete, things later this way. It’s obviously not what we teach in CS though, but we really should.


"Know when you're testing the framework's capability. If you are, don't do it."

Hard disagree on that. Frameworks change over time. How certain are you that they won't make a seemingly tiny design decision in the future that breaks your software?

One of the most valuable things tests can do for you is to confirm that it is safe to upgrade your dependencies.

If all your test does is duplicate tests from dependency that might be a waste of time... provided that's a stable, documented feature and not something that just happens to work but isn't necessarily expected stable behavior.

But you shouldn't skip testing something because you're confident that the dependency has already covered that.

The tests should prove your software still works.


I think it probably is saying: don't write a "useEffect runs when its dependencies change", write a "User is redirected to their accounts page after loging in", and you are testing both your own code and the framework's routing / side effects handling / state tracking, etc.

Integration tests for complex flows inadvertently tests your dependencies, which as you say is awesome for when you have to upgrade.


I very much agree with you on this. Upgrading dependencies is something you do and you are responsible for if it broke things. I'd frame it slightly differently though. I think you should have some tests that tests the full functionality the user will experience, regardless where the pieces come from. And don't go our of your way to mock or stub something because it's not written by you. There is no reason to avoid useState() being executed in your test suite as long as your code actually depends on it and your test doesn't get super expensive to execute or write because of it. Now, if something is expensive, try to limit testing it only to the top of your testing pyramid. But you should till test the full stack because that's what you are gonna ship!


If you are going to write a test that tests the frameworks capability, submit a PR to the framework.

The only part that's relevant to you is how it interfaces with your own code. If their behavior changes but your code still does exactly what you want it to, the test shouldn't fail.


I don't think submitting a PR to a framework is a good strategy:

1. They may not accept the PR

2. Even if they do accept that PR, there's no guarantee that in two years time some maintainer will decide to change that behaviour (and update or discard the test you contributed) anyway.


> It's better to have some wonky parameterization than it is to have multiple implementations of nearly the same thing. Improving the parameters will be easier than to consolidate four different implementations if this situation comes up again.

From https://go-proverbs.github.io/: A little copying is better than a little dependency.

Curious to see how the community is divided on this, I think I'm more leaning towards the single implementation side.


The older I get, and the more experience I have, the more I think "single implementation" is generally a lie we tell to ourselves.

To the author's point - a wonky param to control code flow is a clear and glaring sign that you consolidated something that wasn't actually the same.

The similarity was a lie. A mistake you made because young features often have superficial code paths that look similar, but turn out to be critically distinct as your product ages.

Especially with modern type systems - go ahead and copy, copy twice, three times, sometimes more. It's so much easier to consolidate later than it is to untangle code that shouldn't have ever been intertwined in the first place. Lean on a set of shared types, instead of a shared implementation.

My future self is always happier with past me when I made a new required changeset tedious but simple. Complexity is where the demons live, and shared code is pure complexity. I have to manage every downstream consumer, get it right for all of them, and keep it all in my head at the same time. That starts off real easy at shared consumer number 2, and is a miserable, miserable experience by consumer number 10, with 6 wonky params thrown in, and complex mature features.

---

So for me - his rule of thumb is egregiously too strict. Consolidate late and rarely. Assume the similarity is a lie.


I decide on a case by case basis.

I've been bitten by both decisions in the past. Prematurely abstracting and "what's 4 copies gonna do, that's totally manageable" until it cost quite some time to fix bugs (multiple times then, and because of diverged code paths, with multiple different solutions)


I think an abstraction should imply/enforce a common abstract structure. It inscribes an abstraction layer into the system. Moving a couple of concrete lines into a single named scope is orthogonal to this.


I was going to disagree with this because I thought "but what about the tests!", but in the linked video of Rob Pike's talk he says (paraphrased) "but then of course there's a test, so that every time it is tested, it guarantees that the library and the copied code agree on their definition. The test has a library dependency but the copied code doesn't".

That's actually a really clever way to do things and I think I'll adopt it.


Like most things, blanket advice will break down in some cases, things can be highly contextual.

Generally, my anecdotal experience is that Go libraries have far fewer average dependencies than the equivalent Rust or JavaScript libraries, and it may be due in part to this (the comprehensive standard library also definitely helps).

I definitely tend to copy small snippets between my projects and rely sparingly on dependencies unless they're a core part of the application (database adapter, heavy or security-sensitive specifications like OIDC, etc)


On commit size:

> You just never know when you have to revert a particular change and there's a sense of bliss knowing where you introduced a bug six days ago and only reverting that commit without going through the savagery of merge conflicts.

This is key for me: a good shape to aim for with a commit is one that can be easily reverted.


I've not seen "roll back a bug by reverting a single commit" be a viable option nearly as much as "roll back by manually changing the buggy part," especially for bugs six days old (or older).

It's usually too hard, regardless of what your commits look like individually, to revert "just one buggy small bit" without breaking the rest of the new feature that was supported by that change, or re-introducing an old bug, or having other inconsistent resulting behavior. And "turn off the whole feature" is rarely desirable unless the bug is producing truly catastrophic behavior.

A roll-forward "just fix that bug" is the ideal case. A more complex "roll forward and make a kinda involved fix" is common too. But neither of those regress things from a user or consumer POV.


The way I frame it is less of rollback, more of bisect: If I have to use `git bisect` to find a problem's root cause, will this commit be enough?

Make it bisectable and life will be easier down the line.


Yeah, a rollback might be unfeasible for most things, but more "atomic" commits allow anyone handling an issue to better understand the reasoning behind any change, and if something was amiss in that particular change.


A trick to help doing that, when you start having multiple changes that could be distinct commits, use git add --patch to select the changes one by one. Not only that can allow you to create smaller changes, it also gives you an opportunity to review your code before you commit


Agreed, but after decomposing the change into logical commits, doublecheck that the project builds after each commit.


Or even better, set up a pre-commit hook so that happens automatically.


Stalling a commit for more than a third of a second is way too much.


Slightly-longer commits to have never-broken commits... hmmmmmm.


If you hit a full second, that's just right back to the svn days where there was just enough friction people wouldn't bother to commit until everything was completely done, then the commit would often be too big to easily describe why things were done in the commit message.


I don't think taking one second to commit is a problem. However, verifying that software builds typically takes a lot longer than a second.


Huh, I guess we have different expectations. I really don't mind a few seconds even to know I didn't totally break things in a commit.


Second-order effects. Longer to commit means less commits which means more grab-bag commits which means less useful commits.


Or even better, do that in CI.


As someone who works in small companies, and had to endure developers who were using gitlab as "offsite backup" or I guess "push-based 'does this compile?' workflow", please don't do this. CI minutes are rarely free, and for damn sure are not "glucose free". If you can't be bothered to run the local compilation step for your project, that is a wholly different code smell


Not for things like type / lint / formatting errors. Tests too if not too long.

I mean have them in the CI as well, but for sure have them as pre-commit hooks.


Also look up at any one of the "stacked branches" approaches (plenty of git extensions or tutorials that work natively with newer git versions).

For those still in bzr land, there used to be a wonderful "bzr-pipelines" plugin to enable seamlessly working on a set of interdependent changes.


Unless all your features actually fit in one small commit, this doesn't work. Much more common is that you merge a chain of dependent commits, which means you cannot just rollback a single commit, since that will leave your codebase hopelessly broken. Much cleaner to commit the entire feature as one large commit.


If your "features" don't fit in one small commit, you should probably look to redefine what "features" are or at least not tie them to a commit.

You can and should split your features into a series of product/codebase improvements that end up delivering the full "feature" with the last of your commits. If done smartly, along the way, you'll be delivering parts of the feature so your users would start benefiting sooner.


You can rollback a merge if that is the goal of this one-large-commit.


More precisely: you can revert a merge.


I agree with this, as well as the $(git add -p) suggestion, which JetBrains tools make super-duper easy, but my reasoning is not for reverts but for cherry-pick. I can count on one hand the number of meaningful reverts I've seen, but have innumerable examples of needs to cherry-pick. I admit that will heavily depend upon the branching style used in the project, but that's my experience


Cherry-pick is the copy-paste of VCS. And although copy-paste in code can work, copy-paste at the version control level itself is suspect if we’re talking about long-term history (why copy the changes of a commit?).


There is a small distinction between copy-paste, which short of using static analysis tooling is undetectable, versus $(git cherry-pick) which is tracked copy-paste

Contrast:

  git checkout -b feat-1
  echo 'awesome change' > README.md
  git commit -am'fix'
  git checkout main
  git checkout -b feat-2
  echo 'awesome change' > README.md
  git commit -am'moar awesome fix'
  git checkout main
  git merge feat-1
  git merge feat-2
with its cherry-pick friend

If one is curious why in the world multiple branches would need the exact same commit, I'm sure there are hundreds of answers but the most immediate one is CI manifests are per-branch so if one needs a change to CI, I would a thousand times rather $(for b in $affected_branches; do git checkout $b; git cherry-pick $my_awesome_ci_fix; done) which will survive those branches re-joining main


> Merge made by the 'recursive' strategy.

There's a few things people think git tracks that it actually doesn't, instead it compares diffs and presents the user with extra information that looks like tracking. The go-to example is renaming files, there is a "git mv" but it doesn't actually track the rename. Git reconstructs the rename when looking at history based on if there was a file removed and a file added in the same commit that are some percentage the same.

In this case, if that last line was "git cherry-pick feat-2", it does the same (or at least similar) comparisons as "git merge feat-2", but errors because the user would expect cherry-pick to create a new commit and in this case it won't, instead presenting a message asking the user how to continue.


Fine, I may be guilty of "coding in a textarea" and obviously did not actually open a terminal and execute those instructions. But I hope a reasonable person could agree that manually redoing a change to .gitlab.yml over and over is not reasonable, regardless of whether git is smart enough to realize what has gone on or not


You don't have to literally revert the commit, but it will make it easier to write commit to undoy plus aiming for this means your commits will be well-contained and reviewable, which is also good.



I try to do that for legibility and because it’s easier to combine commits than to split them (that’s just how git is). Revertability is pretty meh. It’s nice when you get to revert a single commit and hotfix/solve the problem. But with these commit sizes you hardly save any time that way.


Pretty substantial disagree with the second half of 4. and 5.

>If the component is big, then you introduce more complexity[...] If a particular function doesn't fit anywhere, create a new module (or class or component)

This smells like the agile/uncle Bob "every function should be four lines" school of thought which is really bad.

Paraphrasing Ousterhout's book, it's the other way around, when components are big and contain significant implementation you're hiding information and reducing complexity, which is the purpose of good program design. When your component/object/module is just surface you've basically done no work for whoever uses your code. I see it way too often that people write components that are just thin wrappers around some library function in which case you haven't created an abstraction, you've just added a level of indirection.

If a function does not fit anywhere that's a strong indication that it shouldn't be a separate function, it's likely an implementation detail.


Are you talking about this book: A Philosophy of Software Design? Can you recommend it?

I am looking for rebuttals of this naïve Uncle Bob style and while I like the content of Casey Muratori, he doesn’t resonate with more corporate people.


Yup, it's a recommended read. It's pretty short, 160 pages or so and not at all difficult, the title makes it sound a bit grander than it is.


Will check it out, thanks


“Know when you’re testing the framework’s capability. If you are, don’t do it. The framework is already tested by people who know a lot more than you.”

How many times have you had to roll back a minor version upgrade because the library maintainers *absolutely don’t* know what they are doing? Spring, Netty, and Java ecosystem, I'm looking at you...


next.js, apollo client... so many surprises even in minor point versions.


There is this dichotomy - companies say they want stable codebase with clear justifications for each chnage (at least heavily regulated companies do).

But good practise here is continual refactoring - almost inimicable to that stability plus imagine the final sign off comes from business who don’t understand why you rewrote a codebase that they signed off two months ago and now have to re-confirm


Software development is simple, try to maximize all of these at the same time:

1. Performance

2. Reliability

3. Readability

4. Correctness

5. Maintainability

6. Extendability

7. Consistency

8. Adequacy

9. Simplicity

10. Predictability


We are all in agreement here. This entire comment section is just about the coefficients for the objective function.


All 10 falls under the old wisdom of "fast, cheap and quality, but you can only pick 2".


Simple is too difficult and I look smart with a complex solution /s


> 5. If a particular function doesn't fit anywhere, create a new module (or class or component) for it and you'll find a home for it later.

I worked at a place that did this with their frontend app. Devs rarely knew where anything should go and so for any given Component/Module, there was usually some accompanying `MyComponent.fns.ts` file. Homes were NEVER found for it later. Code duplication through the nose and lots of spaghetti coupling.

Edit: i'm definitely blowing off some steam. That said, I think there is good virtue in this "habit" so long as there is good reason that it "doesn't fit anywhere" ... and when another module starts referencing the temporary home module, it is a smell that the time is now to give it a proper home.


I also disagree with that advice and believe it to be an anti pattern. Code readability can suffer massively from multiple modules. It depends on the use case and particilar function so this kind of advice should not be a general rule but rather a unique decision should be made for each different situation.

Very uncomfortable truth (imo) for many developers who prefer to find abstractions and general all encompassing advice. I have found that the correct placement of functions in files/classes is a "sense" that is improved solely with experience and is never truly complete. It is after all about communicating intent to other human beings for which there are no hard rules.


I used to do the utils file, but now it's either a local function (same file, close to usage) or I find a proper home for it (even if it's a rudimentary module).


Other end of this spectrum is ever growing “utils” package.


I don't think these points are well justified. They're all on the format "do this or a bad thing will happen", where often it's not obviously clear why the supposed bad thing is bad.

1. The alternative to small commits (as motivated by the difficulty in reverting large commits) is to not revert the commit, but just add a new commit that fixes the bug. The merits of this is of course debatable, but it does consitute a gap in the reasoning.

2. "Big refactorings are a bad idea", why though?

5. "It's better to create a new independent construct than to jam it into an existing module where you know deep down it doesn't make sense", why though?

6. As a counter point to designing an API via unit tests, you can also just have a design session. Think about the problem for a moment, write some design documents down. When dealing with APIs and interfaces, database schemas, this type of up-front design tends to deal by far the best results.

7. There's no clear argument why having more than two instances of a function is bad. Yeah implementations may diverge, but is that necessarily a bad thing? Even if they started out the same, why do they need to keeps staying the same?

10. "Testability is correlated with good design" is not really motivated at all. I know many designs that are good but not easily testable, and many designs that are extremely testable, but also hideously convoluted (e.g. "uncle bob's syndrome").


1. Making a new commit is not equivalent at all to reverting a commit. I'm a fan of failing forward too, but reverting the exact commit you know caused the issue implies you know exactly what the issue is, which is invariably good.

2. For the same reason that 'lets rewrite everything from scratch' generally is a bad idea.

5. Because deep down you know it doesn't make sense? Nobody will import your 'awesomeUtilityFunction' from the 'WaarghComponent' file, but they might if it's in a file/module called awesomeUtilities, or just plain awesomeUtilityFunction.

6. Designing an API via unit tests is the equivalent of a design session with a different whiteboard. I like how you complain about things not being well justified and then just claim that your own suggestion leads to better results without any motivation.

7. I think it should be fairly obvious that you only care about this if you _want_ to keep the implementations the same.

10. No good design is 'not easily testable'. Easily testable is a requirement for good design. In my experience, when someone makes this point they try to imply that when you bend yourself into corners to make your test work (as given in the example), you should stop doing that and instead look at better ways to abstract your dependencies (dependency injection, mockable utility functions, lambdas etc.).


> 1. Making a new commit is not equivalent at all to reverting a commit. I'm a fan of failing forward too, but reverting the exact commit you know caused the issue implies you know exactly what the issue is, which is invariably good.

This seems like backwards logic. Even if reverting the commit implies you know (or think you know) exactly what the issue is, doesn't adding a new commit fixing the issue also imply this?

> 2. For the same reason that 'lets rewrite everything from scratch' generally is a bad idea.

I'd vehemently object to the two being equivalent. Big refactorings are more laborious for sure and all else being equal, smaller are arguably preferrable to larger, but there are worthwhile changes you simply can't implement in small steps. Big refactoring tasks are mostly a problem if you have too many people working on a codebase, as it requires some degree of freezing a part of the codebase for changes to avoid merge issues.

> 7. I think it should be fairly obvious that you only care about this if you _want_ to keep the implementations the same.

The scenario as being discussed actually goes into the case where their requirements do in fact diverge, and suggests adding parameters to coax the divergent implementations into still being the same code.

> Easily testable is a requirement for good design.

I'd ask in what sense you mean the design is good? The test suite surely serves the code, and not the other way around. Afer all, we've sent people to the moon with code that never saw modern testing practices. There are other ways of ensuring code does what it should than unit tests.

I agree there are some types of code that benefits from extensive testing, but it's far from universal, and the tools needed to provide testability are anything but free, both in terms of performance and driving software complexity.

In that case, an alternative to extensive testability is to design the code in such a simple way that there isn't many places for bugs to hide.


> In that case, an alternative to extensive testability is to design the code in such a simple way that there isn't many places for bugs to hide.

I like this as an ideal. But I struggle to see how code can be both so simple that it is hard to make a mistake and also difficult to unit test.

Most of what I have seen forcing tests to be overly complex and brittle has been coupling code that have very different responsibilities (for example, testing business logic requires testing UI components that perform it). Separating those out would have been better design and more testable.


> Testability is correlated with good design. Something not being easily testable hints that the design needs to be changed. Sometimes that design is your test design.

I have struggled a bit with this at times. There are certain things that can go from "this implementation fits on a postcard" to "this implementation fits on 3-4 pages" if you want to provide the introspection required to provide useful tests (less true in languages like Haskell that provide nice monadic tricks, granted). I like having tests just to prove the point, but I will feel quite bad ripping up _tiny_ implementations to get tests working.

But test code is also code that should be introspected in a certain way (though the objectives are different). Maybe I'm just doing some things the wrong way.


“Aim for at least half of all commits to be refactorings”.

I feel like this is the end game of scrum and most agile methodologies - endless refactoring on a treadmill with no off button,

I like to be introspective, and I am human so my code is far from perfect. But if I was refactoring half of my time I would go more than a little crazy.

The good systems I have worked on have converged on designs that work for that space. Both developers and users see and value the stability.

The bad ones have had the kind of churn the article mentions. Developers are constantly rewriting, functionality is subtly changing all the time; stability doesn’t exist.


I mostly agree. One thing to add:

Your tests should test the API of the code/module/system you are responsible for. Nothing else.

And the tests should really push your API to the limit and beyond. For example, if your API is a server (with a HTTP API) then have N clients try to use it at the same time, as fast as possible, and see what happens.

And of course measure memory usage, disk usage etc. while running these tests continuously for days.

This will automatically test everything you depend on. And you will know instantly of any of the dependencies you rely on have changed in a way that impacts your code.

I have had zero (yes zero) bugs in production for years. Only because of tests that really push the servers I am responsible for hard. Way harder than any customers would.

While the tests often reveal that I am very capable of adding bugs to the code :)

The systems I typically work on are large C++ applications used by large international companies you most likely have heard about.


>If a particular function doesn't fit anywhere, create a new module (or class or component) for it and you'll find a home for it later. It's better to create a new independent construct than to jam it into an existing module where you know deep down it doesn't make sense. Worst comes to worst, it lives as an independent module which isn't too bad anyway.

Innocuous and fine I guess but it points to (and then ignores) a deeper and interesting issue around how codebases grow, split, and merge over time. When the same thing happens at several levels of abstraction/zoom, take note. Refactoring to extract a method is similar to splitting a package is similar to splitting a monolith into microservices (and the reverse operations). The creation of a new package/module/whatever is an early signal of a "fault line" around which a future refactoring will occur (or, more often than not, a signal that the dev may not be familiar with where things go - but even in this case I tend to agree with the OP to just put it in a new place and let the code review fix it.)


> [ignore] things that might prevent you from doing stuff later.

This only works if you know what is and is not a potential future blocker. A perfect example is the data model: IME, most devs do not understand RDBMS very well, and so don’t understand how their decisions will affect future changes or growth. Or worse, they recognize that they don’t know, but choose to dump everything into a JSON column to avoid migrations.


Alternative to #10: avoid mocking.


I believe there is nuance to this: how else would any sane person exercise error flows in software, or -- as I have personally implemented -- test against things which are wallet-expensive in real life?

What I oppose is mocking every single dependency of every single injection in the component. It ends up being 50x the code of the system under test and requires throwing it all away when the implementation changes


> how else would any sane person exercise error flows in software

Interesting question. Have you got any specific examples of something hard to test without mocks?

I agree there's nuance, but I find "don't use mocks" a great starting point, and the sweet spot for web services to normally be only mocking/faking/stubbing/simulating/doubling 3rd-party APIs. I'm sure the spot moves dependent on context, e.g. writing hardware firmware might warrant a different approach.

Maybe a clearer expression would be "consider mocks a code smell".


I have two examples at hand: chasing memory leaks when enumerating over 10,000 EBS volumes, and ensuring the 500-class response handlers behave correctly for S3 (which is exceedingly hard to reproduce using the real S3 api)

Another common one is introducing network stalls to ensure timeout code behaves sanely. I'm aware of Comcast and the various nf trickery but I mean something a normal developer could run as part of normal tests, not involving sudo anything

Even as I write this, I'm aware that "there's more than one way to do it" and I'm sure everyone has their own favorite. But my experience has been that only the most pristine decomposed software components have very clean boundaries for testing just this one aspect. So for the rest of us stuck using the AWS sdk and similar, one can choose to shim the interactions with the SDK just to be able to swap it out for testing (which I violently oppose), or feed the software you do control a pseudo-implementation that will explode in very specific ways


> ensuring the 500-class response handlers behave correctly for S3 (which is exceedingly hard to reproduce using the real S3 api)

What did you use for this? I've achieved this previously by abusing minio, combined with very large uploads & downloads. Maybe that qualifies as some kind of verified mock though(?)

I'd be interested to use a cleaner approach which is also realistic.


Thankfully most of the AWS SDK uses interfaces[1] so one can use Mockito if you already have the muscle memory with it, or its InvocationHandler friend[2] if truly customized responses are needed

If one needs to exercise the AWS SDK itself, as part of some repo steps for a support issue, it's similarly glucose-cheap to patch moto to 500 in the necessary circumstances. I've had good luck using their ExecutionInterceptor ServiceLoader mechanism[3] to patch the Client's endpoint URI to point to moto or localstack without having to monkey with every single Client instantiation, which can be especially no-fun for STS AssumeRole or AssumeRoleWithWebIdentity setups (since one doesn't want it to use real STS for anything). That way the actual SDK pathway is still exercised all the way into the caller's code for a more honest-to-goodness bad outcome but without the hope-and-pray of contacting real S3

1: e.g. https://sdk.amazonaws.com/java/api/2.29.16/software/amazon/a...

2: https://docs.oracle.com/en/java/javase/11/docs/api/java.base...

3: https://github.com/aws/aws-sdk-java-v2/blob/2.29.17/core/sdk...


Unfortunately, most "frameworks" in existence today do not follow a simple, functional design, and they tend to make you mock quite a bit.

But the alternative to "mocking" is to use verified fakes (same test passes for both the real implementation and the fake) that actually do something "real" (even if it's simply persisting data in memory).


My complaint about using "real implementations" (aside from databases, which, sure, knock yourself out with testcontainers or even hsqldb running in compatibility mode[1]) is that managing the state of real systems is incredibly hard. I am aware of aws-nuke and its kin, but tearing everything down and then setting everything up for every test cycle consumes very real wall clock time and the flakes drive up "test fatigue" where folks start merging things with test failures because "oh, you know, it's just kidding" or the deadly enemy "we don't have time to wait for the test cycle, we need the fix out now!"

I am 100% with you on the verified fakes and love moto (and its friend localstack) for that reason. If I had lottery money, I'd even go so far as to create a moto-eqsue implementation backed by lxc or such and have it actually provision/mutate some running infra that I can snapshot and restore

1: https://www.hsqldb.org/doc/2.0/guide/compatibility-chapt.htm...


It's especially hard for embedded software. You certainly do want hardware-in-the-loop tests, but you also want tests that are independent of the hardware. You have to simulate the hardware interaction, and you definitely want to verify what the code tried to do to the hardware, and when. So for the hardware-interacting layer you want mocks, not just fakes.


Just imagine a world where a component manufacturer (be it hardware or software) also provides a verified fake/simulated implementation.

Even hardware, they likely did develop it using software simulations: they just need to ship it with their SDK. Another thing hardware has it going for it is that it does not change as much.

Note that a verified fake could still have observability points that allow you to monitor what's going on.


I don't get the part about the small commits. To me a commit could be massive and that's alright, provided it introduces some major feature, while a fix could a one-liner. It really depends on the situation.


This means that you should look to break up a "major feature" into smaller, iterative steps to delivery.

In general, the biggest hurdle engineers need to overcome is to believe it is possible and then simply start thinking in terms of delivering value with every single branch (hopefully user value, but a refactoring counts too), and what are the small steps that get us there?

The benefits are amazing:

* Changes are likely to be limited to only one "thing", thus making them both lower-risk and easier to review and QA

* With every step shipped to production, you learn if it is providing the benefit you are looking for or if you need to pivot

* You are not developing a feature branch while "main" moves at the same time, and wasting time on keeping up with it

* If the project gets stopped 3 months in, you have still delivered some value, including those in-between refactorings

* Your customers love you since they are seeing improvements regularly

* There is never any high-risk, big "release" where you need to sit around as 24/7 support and wait for bugs to rear their heads

I am happy to give some guidance myself: what is the "major feature" you think can only be done with a single, large change all at once? (I've done huge DB model changes affecting 100Ms of rows with no downtime, merged two "subapps" into one, migrated monoliths to microservices etc, but also built new full-stack complex features with branches with diff size being less than 400 lines for each)


Large commits are (IMO) a symptom - lack of a plan, a plan that doesn’t work out, etc. Which is fine! You have to figure it all out somewhere.

One thing you can do to address them is to stash the large commit to the side, then piece by piece pull it into a new branch as a series of smaller commits. This also give a good opportunity to refactor before delivery, now that you know what the code is going to do and how.


It makes debugging so much easier to have small, atomic commits. Of course what's viable depends on what you are doing. I've had great success making changes and rolling them out that aren't actually the full feature yet and some or all parts remain hidden. This also can alleviate the race between two large changes coming in and having to deal with merge conflicts.


Having a massive major feature done as a single commit is evil. Merging two branches may conclude combining a unit of work, a major feature, a minor feature with the main branch (of course once the topic branch is merged to the upstream, and never vice versa [rebase in git terminology]). This is logically "a big commit" constructed from a concrete amount of small commits. Additionally, having small atomic commits also makes reverting a commit a trivial operation regardless the branch the commit was introduced in. Bisecting a range of small commits also makes finding a bad commit easier.


> If you don't know what an API should look like, write the tests first as it'll force you to think of the "customer" which in this case is you.

The other way to do this (or if writing tests isn't helping) is to start with writing examples in the README (or wherever it is you keep docs). If your examples look tortured then your API is torturous. If your examples are understandable then your API is probably laid out reasonably.


Seems like the definition here of software is always “maintenance” of something as is, like replacing the boards on Theseus

Sometimes software is hard and 10x engineers just need to rewrite the whole thing or replace large systems

To subscribe to some world where we have to do that in “small changes” limits us

We shouldn’t make process to the weakest engineers


Even if you're a "10x engineer" the ability to describe how you would fix or replace things using just small changes is extremely valuable. And the inability to put together a moderately-detailed plan for that is a big smell.

If you don't actually understand the full set of changes that will be required in order to get to your desired new end state, how can you evaluate whether "just write the whole thing" is a one month, six month, or longer project? There are going to be nasty edge cases and forgotten requirements buried in that old code, and if you discover them for the first time halfway into your big rewrite... you might suddenly find you're only 10% into your big rewrite.

(Especially if you're a "10x engineer" you should understand what makes big rewrites hard and often fail or go way over schedule/budget. You should've seen it all before.)


I've dealt with both: 1. maintenance coding 2. re-write coding

Re-writes take forever, because a lot of the edge cases and bug fixes are lost [1]. You might think they go away, and some do, but new ones are introduced. QA process is critical. Management becomes critical of excuses, and the longer the project is drawn out, the more they get involved. The final shift to a new system is never one-and-done. Management is paying for two systems, canary deploy.

Smaller re-writes are the ideal practice, and your code base is set up this way already, right?

Maintenance code is cheapest.

[1] https://www.joelonsoftware.com/2000/04/06/things-you-should-...


My experience tells me that it's both faster and higher quality to do things in small steps than leave it with your "10x engineers" (everybody thinks they are the one, right?) to "just" rewrite from scratch — and I've got plenty of proof in my close-to-20-years of career (I've never seen that go smooth; I've been a part of dozens of iterative "replace large systems" that were pretty uneventful).

As for the "weakest" engineers, even the "strongest" engineers are weak sometimes (bad day, something personal, health issues, sleep deprivation...).


I don't think it's a matter of making process for the weakest engineers. It's more likely that we're trying to apply one monolithic process to highly variable work.

You hit on something super important that I don't see discussed often enough: Different phases in the software lifecycle require different approaches. Trying to apply "maintenance mode" to a greenfield project (or vice-versa) can be a disaster for the reason you mentioned - sometimes you just can't break the job into small changes until you have something concrete to change! There is time for principled slow change, and there is a time for rapid prototyping. But most teams use a single process for both.


I think it's misleading to say iteration or full rewrites are the only 2 options. The most impactful, yet successful, projects I've worked on rewrite a part of a system. ie replace a custom search index by Solr, but leave the data itself and the UI the same, then once you're happy that went well, improve the data or the UI afterwards.


Why rewrite then? We should have only the strongest engineers, only those able to understand and thrive in any kind of spaghetti.


i do think these are good habits. my favorite is the one about type #3 of tech debt. i wish i could push a button and impart this way of thinking to many of my old coworkers.

(and, there is some room for taste/interpretation/etc. i think the thing about copy-paste and "the third time it's in the code, encapsulate it, and deal with flag params later" is maybe true and maybe not true and may be context or team dependent. i know i have done this a few times and if i am trying to cover that func with tests, the complexity of the test goes up fast with the number of flags. and then sometimes i wonder it is even worth writing these tests when the logic is so dead simple.)


From the article:

> Copy-paste is OK once. The second time you're introducing duplication (i.e., three copies), don't. You should have enough data points to create a good enough abstraction.

There's already a principle that synthesizes this: Write Everything Twice (WET).

It's a play on words to counter the infamous Don't Repeat Yourself (DRY) principle, which clueless but opinionated developers everywhere have used time and again to justify introducing all kinds of problems involving a combination of tight-coupling unrelated code, abstraction hell, adding three classes and an interface to avoid writing two classes, etc. This nonsense is avoided by tolerating duplicate but uncoupled code until the real abstraction and coupling needs emerge.

I still cringe at a PR that a former clueless junior developer posted, where in the name of DRY added a OnFailure handler which, instead of doing any error-handling and recovery logic, simply invoked OnSuccess, because "it's mostly duplicate code and this keeps the code DRY". Utter nonsense.


Unrelated but does anyone have any recommendations for good resources on learning how to write tests/testable software?


Good software development habit: develop good software.


Good code is an asset.


No.

> Know when you're testing the framework's capability. If you are, don't do it

Except that many frameworks are full of confusing behavior that is easy to misuse. It's funny that the post mentions `useEffect()` because `useEffect()` is so easy to misuse. Writing integration tests that make sure your app does what it is supposed to is totally fine.

> If you don't know what an API should look like, write the tests first as it'll force you to think of the "customer" which in this case is you

This is pointless. It doesn't give you any information, you're just guessing at what the API should look like. You won't actually know until it's integrated into a working application. The idea that you can design in a vacuum like this is wishful thinking.

> Copy-paste is OK once. The second time you're introducing duplication (i.e., three copies), don't. You should have enough data points to create a good enough abstraction.

No you won't, and it will often be with code that is similar in some ways but differs in others. Since the kind of people who write this kind of vague bullshit advice disapprove of things like boolean function parameters and use shitty languages that don't have metaprogramming support, this leads to "abstractions" that create awkward, tight coupling where changing one little thing breaks a million stupid fucking unit tests.

> Testability is correlated with good design. Something not being easily testable hints that the design needs to be changed.

Testability is neither necessary nor sufficient for any particular quality attribute. Depending on the application being written, it can be counterproductive to write out full unit tests for everything.

As always with these stupid "software engineering" posts, there is zero data, zero evidence, zero definitions of terms up front, and zero of anything that is actually real. It's just personal preference, making it dogma.


I challenge you to write code that is "testable" (easy to cover with tests for all the important functionality), but which is generally badly designed and structured.

(FWIW, while naming is probably as important, I am not accepting bad naming as that is too easy)


I present you Uncle Bob's own pretty horrible code: https://qntm.org/clean


Thanks: yeah, that's indeed pretty bad.

FWIW, I don't see any tests for this, nor it looks simple to test it, so I don't consider this "testable" code — it looks like this was made to make other code testable, yet it fails to be testable itself.

Also, naming is horrible as well (also noted in the article).


"Technical debt can be classified into three main types" ....

No. You haven't seen real tech debt until you've stared into the abyss and the abyss has stared back.


> 9. Technical debt can be classified into three main types: ...

This isn't _incorrect,_ but I'd say it's insufficient, or at least it lacks a sufficient treatment of what technical debt is and what is important about it.

Technical debt is known technical problems that are affecting or will affect your velocity or the business as a whole. When considering technical debt, you need to know:

- the estimated amount of time required to correct the problem - the ongoing penalty you're paying by not correcting it, if any - the hard cutoff by when the problem must be correct, if any - the consequences for not correcting the problem by the hard deadline

Three examples to demonstrate:

1) You have a User god-model that is thousands of lines of code long. It is incredibly hard to work with, and any change that interacts with it takes, on average, 5x as long as a change that doesn't. It would take appx. four weeks to refactor sufficient methods out of this model to make it as easy to work with as the rest of the code, but there is no hard cutoff by when this problem must be solved.

2) You're only able to clear your job queues on the weekend, and the job queue time has been growing steadily for the past few months. By mid-week, the average queue time is appx. 10 minutes and by end-of-week, it's nearly 30. If this problem is not solved in one month's time, the end-of-week queue time is likely to be over an hour, and in two month's time, the mid-week queue time is, too. We can add extra capacity to our job runner pool in an hour or so, at a cost of $x/month.

3) The new account creation script is a mess of spaghetti code, a real eyesore. Changing it requires about 10-20x as much effort as any other part of the system. It would take appx. 2 weeks to untangle. However, there is no hard cutoff by when this problem must be solved, and in fact, this code is rarely ever touched anyway, only twice in the last year and only small changes were required.

These three cases fall roughly into the three categories suggested by OP (1 -> preventing from doing stuff now, 2 -> preventing from doing stuff later, 3 -> might prevent you from doing stuff later), but you have sufficient information to make informed, better decisions that the simpler model would miss. For example, under the simple mode, the job queue problem would be classified as "try to focus on", but the User god-model takes priority ("minimize" "stuff now" problems). But 2 seems much simpler to fix (provided you can afford it), and the consequences to deprioritizing it in favour of the god-model fix could be catastrophic to user confidence.

And in both systems, we're most likely going to ignore problem #3, but if we know that a larger change to new account creation is coming up, one that you would expect to take 2+ days in another other part of the system, you now can expect that it would instead take 20-40 days in the spaghetti code, but that refactoring it would be appx. 16+2 = 18 days, a net win.


reads like a chatgpt answer




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: