Software had it way too easy for way too long. You could ship faulty code to bil...

jandrewrogers · on Dec 31, 2024

Software engineering is not like physical engineering. I've done both and they necessarily operate from different assumptions about the world.

In software, a single bit-flip or typo can lead to catastrophic failure, it is sensitive to defects in a way physical systems are not. The behavior of software is dependent on the behavior of the hardware it runs on; unlike physical engineering, the hardware for software is not engineered to the specifications of the design, it has to run on unknown hardware with unknown constraints and work correctly the first time.

Physical systems are inherently resilient in the presence of many small defects, a property that physical engineering relies on greatly. Software is much less tolerant of defects, and there is a limited ability to "over-engineer" software to add safety margin, which is done all the time in physical engineering but is qualitatively more expensive in software engineering. Being broken is often a binary state in software.

I've written software for high-assurance environments. Every detail of the implementation must be perfect to a degree that would make engineers with even the most perfectionist tendencies blush. Physical engineering requires nothing like this degree of perfectionism. In my experience, the vast majority of engineers are not cognitively equipped to engineer to that standard, and in physical engineering they don't have to.

lisper · on Dec 31, 2024

> In software, a single bit-flip or typo can lead to catastrophic failure

That can happen in physical engineering too, if it's not done right. Likewise, if a single bit flip can lead to catastrophic failure, that's an indication that the system was poorly engineered.

The problem with software is not so much that it is fundamentally different in terms of its failure response (though there certainly are differences -- the laws of physics provide important constraints on hardware that are absent in software) but rather that it is so easy to copy, which tends to entrench early design decisions and make them difficult to go back and change because the cost is too high (e.g. syntactically-significant tabs in 'make'). But even that can and does happen in physical design as well. The Millennium Tower in San Francisco is the best example I can think of. The World Trade Center is another example of a design that failed because the failure mode that destroyed it was not part of the original design requirements. When the WTC was designed, no one imagined that someone would fly a jet into them some day. Today's adversarial environment was similarly hard to imagine before the Internet.

It absolutely is possible to engineer software for robustness, reliability, and trustworthiness. The limiting factor is economics, not engineering.

UltraSane · on Dec 31, 2024

I once found a company selling a password server like Thycotic Secret Server that must have been written by a madman. It used secret sharing to split the passwords into 3 of 5 shards and stored them on 5 different servers. He wrote the server in two different languages and it was meant to run on both windows and Linux and BSD to prevent common bugs. I don't remember the name and can't find it anymore.

AWS is using a lot of Formal Verification and Automated theorem proving on their core systems like S3 and TLS to increase reliability.

hn367621 · on Dec 31, 2024

> physical engineering they don't have to

Totally agree with what you wrote but in any well run engineering organization one has to consider the cost/benefit of extra testing and engineering.

If you're making a USB powered LED desk lamp, you get into diminishing returns fairly quickly. If you're making basically anything on JWST, less so.

kevin_thibedeau · on Dec 31, 2024

There is no excuse for making defective products, no matter how low the profit margins are. The key why properly engineered anything works is that safety margins are baked into the design. Every real engineering field uses repeatable processes with known likelihood of success. Software "engineering" doesn't do that.

jandrewrogers · on Dec 31, 2024

The concept of "safety margins" in physical engineering are largely nonsensical in a software context. In physical systems, correctness is a bulk statistical property of the design, an aggregate of probability distributions, which makes safety simple. If you are uncertain, add a bit more steel just in case, it is very cheap insurance. Physical systems are defect tolerant, they aren't even close to defect-free.

In software systems, correctness is binary, so they actually have to be defect-free. Defects don't manifest gradually and gracefully like they often do in physical systems.

hn367621 · on Dec 31, 2024

Defective means a product doesn’t deliver on its specifications. For example if my LED desk lamp doesn’t promise to last any time at all, it’s not defective if it fails inside a month. If you want one that lasts longer, you pay more and can have that. Same for software. But most software basically promises nothing…

datadrivenangel · on Dec 31, 2024

Much better to think about low-quality products instead of defective. Junk can still be useful, and just good enough is definitionally good enough (most of the time). Also, most real engineering fields are full of projects that are done in non-repeatable ways and go horribly over budget. You're correct that for implementation you can get repeatable processes, but for process improvement you don't have anywhere near the level of repeatability.

bdangubic · on Dec 31, 2024

fantastic post. what I will “disagree” on

Every detail of the implementation must be perfect to a degree that would make engineers with even the most perfectionist tendencies blush

the amount of software that fits this description is likely comparable to amount of physical engineering that requires the same perfectionism…

hulitu · on Jan 2, 2025

> In software, a single bit-flip or typo can lead to catastrophic failure

That's why you have tests for this.

The main problem that i see with software, is that the testing and bug finding part are left to the user.

And, when testing is done, it is done as easy as possible. For example a function must return 5 when it has 8 as an input. They only test for 5 as an input value and 2 or -1 are left to the user to test them.

gjsman-1000 · on Dec 31, 2024

There are 2.8 trillion lines of code in this world. If I’m an engineer hired to work on a preexisting project, like 95% of jobs are, do I want to take liability for code I didn’t write? Or for if I make a mistake when interacting with hundreds of thousands of lines of code I also didn’t write?

No.

What you’re suggesting is about as plausible as the Aesop fable about mice saying they should put a bell on the cat. Sounds great, completely impossible.

So what about only new code then? In that case, does old code get grandfathered in? If so, Google gets to take tens of billions of lines with them for free, while startups face the audit burden which would be insurmountable to reach a similar scale. Heck, Google does not have enough skilled labor themselves to audit it all.

Also completely unfeasible.

And even if, even if, some country decided to audit all the code, and even if there was enough talent and labor in this world to get it done by the next decade, what does that mean?

It means all research, development, and investment just moves to China and other countries that don’t require it.

Also completely unfeasible.

> “Their livelihoods should be on the line.”

This fundamentally relies on the subject being so demonstrably knowable and predictable, that only someone guilty of negligence or malice could possibly make a mistake.

This absolutely does not apply to software development, and for the reasons above, probably never will. The moment such a requirement comes into existence, any software developer who isn’t suicidal abandons the field.

whatever1 · on Dec 31, 2024

Let’s say you are a civil engineer and your calculator had a problem spitting out wrong results the day you were calculating the amount of reinforcement for the school you were designing. If the school collapses on the kids, you are going to jail in most countries . It does not matter the calculator had an issue, you chose to use it and not verify the results.

jandrewrogers · on Dec 31, 2024

That is because this is trivial to check and the systems are simple compared to software, so the cost imposition of the requirement to do so is minor. The software engineering equivalent would be a requirement to always check return codes. I don't think anyone believes that would move the needle in the case of software.

There is a lot of literature on civil engineering failures. In fact, my civil engineering education was largely structured as a study of engineering failures. One of the most striking things about forensic analysis of civil engineering failures, a lesson the professors hammered on incessantly, is that they are almost always the result of really basic design flaws that every first-year civil engineering student can immediately recognize. There isn't some elaborate engineering discipline preventing failures in civil engineering. Exotic failures are exceedingly rare.

thfuran · on Dec 31, 2024

So the defense for software is "I helped build a system so complex that even attempting to determine how it might fail was too hard, so it can't be my fault that it failed"?

gjsman-1000 · on Dec 31, 2024

Unless you want to go back to the steam age, it’s not a defense, but all we are humanly capable of.

Never forget as well that it only takes a single cosmic ray to flip a bit. Even if you code perfectly, it can still fail, whether in this way or countless other black swans.

2.8 trillion lines of code aren’t going to rewrite themselves overnight. And as any software developer can tell you, a rewrite would almost certainly just make things worse.

thfuran · on Dec 31, 2024

It would cost an untenable amount of money to rebuild all buildings, but that hasn't stopped us from creating and updating building codes.

theamk · on Jan 1, 2025

The cost plays a very important role in building codes, a lot of changes are either not made at all (because they will be prohibitely expensive), or spread out over many years.

Plus, the bulding codes are safety-focused and often don't cover things that most people would consider defects: for example a huge hole in the interior wall is OK (unless it breaks fire or energy efficiency codes)

jandrewrogers · on Dec 31, 2024

Not at all, you can have that today if you are willing to pay the costs of providing these guarantees. We know how and some organizations do pay that cost.

Outside of those rare cases, everyone is demonstrably unwilling to pay the unavoidable costs of providing these guarantees. The idea that software can be built to a high-assurance standard by regulatory fiat and everyone just gets to freeload on this investment is delusional but that is what is often suggested. Also, open source software could not meet that standard in most cases.

Furthermore, those guarantees can only exist for the narrow set of hardware targets and environments that can actually be validated and verified. No mixing and matching random hardware, firmware, and OS versions. You'll essentially end up with the Apple ecosystem, but for everything.

The vast majority of people who insist they want highly robust software neither write software to these standards nor are willing to pay for software written to these standards. It is a combination of revealed preferences and hypocrisy.

yuvalr1 · on Dec 31, 2024

When handling critical software and hardware, for example automated cars, it should be the case that it's never ever the sole responsibility of a single individual. People make mistakes, and will always do. There should be a lot of safety mechanisms existing to ensure nothing critically bad ever happens due to a bug. If this is not the case, then management is to blame, and even the state, for not ensuring high quality of critical equipment.

When something like that does happen, it is very hard to know the measure of responsibility every entity holds. This will most certainly be decided in court.

gjsman-1000 · on Dec 31, 2024

How many structural parts does a school have that need to be considered? How many iron beams? How many floors? Several thousand at most? Everything else on the BOM doesn’t matter - wallpaper isn’t a structural priority.

In computer code, every last line is possibly structural. It also only takes a single missing = in the 1.2 million line codebase to kill.

Comparing it to school engineering is an oversimplification. You should be comparing it to verifying the structural integrity of every skyscraper ever built; for each project.

aeonik · on Dec 31, 2024

Chemical and elemental properties of the walls and wall paper can matter though.

Leaded paint, Aresenic, flammability, weight (steel walls vs sheet rock).

The complexity is still less than software though, and there are much better established standards of things that work together.

Even if a screw is slightly wrong, it can still work.

In software it's more like, every screw must have monocrystalline design, any grain boundary must be properly accounted for, and if not, that screw can take out an entire section of the building, possibly the whole thing, possibly the entire city.

whatever1 · on Dec 31, 2024

The claim was not that it will be easy, but since the stakes are high the buck has to stop somewhere. You cannot have Devin AI shipping crap and nobody picking up the phone when it hits the fan.

hn367621 · on Dec 31, 2024

More like your engineering organization gets sued into oblivion because it created design processes with a single point of failure. Things happen all the time. That’s why well run organizations have processes in place to catch and deal with them.

In software, when people think it counts, they do too. The problem is not all people agree on “when it counts”

hulitu · on Jan 2, 2025

> do I want to take liability for code I didn’t write?

That's the problem, isn't it ? Nobody wants to assume liability. They all want just the profits.

Think about, if your car will come with a disclaimer like Microsoft's EULA.

thfuran · on Dec 31, 2024

>do I want to take liability

I don't suppose most doctors or civil engineers want to take liability either.

BirAdam · on Dec 31, 2024

I actually think code quality has decreased over time. With the rise of high bandwidth internet, shipping fixes for faulty garbage became trivial and everyone now does so. Heck, some software is shipped before it’s even finished. Just to use Microsoft as an example, I can still install Windows 2000 and it will be rock solid running Office 2K3, and it won’t “need” much of anything. Windows 10 updates made some machines I used fail to boot.

hn367621 · on Dec 31, 2024

> You could ship faulty code to billions without anyone blinking an eye.

Not all software is equivalent and there is plenty of code that gets treated with the precision you're asking for.

But the ability to ship out code to billions in the blink of an eye is both the strength and the weakness of modern software.

It allows few engineers to tackle very complex problems without the huge investment in man hours that more rigor would require.

This keeps costs down enabling many projects that wouldn't see the light of day if the NRE was equivalent to HW.

On the other hand, you get a lot of "preliminary" code in the world, for lack of a better word.

At the end of the day all engineering is about balancing trade-offs and there are no right answers, only wrong ones. :)

hulitu · on Jan 2, 2025

> Software had it way too easy for way too long. You could ship faulty code to billions without anyone blinking an eye. It was just harmless ideas after all.

Like Crowdstrike ? Or Microsoft ?

> The stakes are now higher with data being so important and the advent of algorithms that affect people directly. From health insurance claims, to automated trading, social media drugs and ai companions, bad code today can and does ruin lives.

See above and the UK Post scandal. Nobody gives a shit. There is no acountability for bad software. Software is not a product from a liability point of view.

> Software engineers, like every other engineer have to be held accountable for code they sign off and ship.

They managers more than them. Software engineers are not responsible for the software lifecycle. Somebody must test the code. Somebody must review the code.