Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Reliability is not based on a system that cannot fail. It is based on a system that can survive failure.



The canonical paper on handling software failures: https://erlang.org/download/armstrong_thesis_2003.pdf


There's much more to it than the programming language.

Algorithms can be faulty as well.


> There's much more to it than the programming language.

Which was never claimed.

That paper is a little bit about Erlang and a whole lot about OTP and other methodology and design technique.

It is still, very much "the paper" for distributed systems, though its applicability to this particular problem is limited.


A system (whole) that can survive the failure of (some of the) individual parts. Up to a limit.


… or cannot fail




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: