It seems people have a blind spot for unwrap, perhaps because it's so often used in example code. In production code an unwrap or expect should be reviewed exactly like a panic.
It's not necessarily invalid to use unwrap in production code if you would just call panic anyway. But just like every unsafe block needs a SAFETY comment, every unwrap in production code needs an INFALLIBILITY comment. clippy::unwrap_used can enforce this.
Yes? Funnily enough, I don't often use indexed access in Rust. Either I'm looping over elements of a data structure (in which case I use iterators), or I'm using an untrusted index value (in which case I explicitly handle the error case). In the rare case where I'm using an index value that I can guarantee is never invalid (e.g. graph traversal where the indices are never exposed outside the scope of the traversal), then I create a safe wrapper around the unsafe access and document the invariant.
If that's the case then hats off. What you're describing is definitely not what I've seen in practice. In fact, I don't think I've ever seen a crate or production codebase that documents infallibility of every single slice access. Even security-critical cryptography crates that passed audits don't do that. Personally, I found it quite hard to avoid indexing for graph-heavy code, so I'm always on the lookout for interesting ways to enforce access safety. If you have some code to share that would be very interesting.
My rule of thumb is that unchecked access is okay in scenarios where both the array/map and the indices/keys are private implementation details of a function or struct, since an invariant is easy to manually verify when it is tightly scoped as such. I've seen it used it in:
* Graph/tree traversal functions that take a visitor function as a parameter
> I don't think I've ever seen a crate or production codebase that documents infallibility of every single slice access.
The smoltcp crate typically uses runtime checks to ensure slice accesses made by the library do not cause a panic. It's not exactly equivalent to GP's assertion, since it doesn't cover "every single slice access", but it at least covers slice accesses triggered by the library's public API. (i.e. none of the public API functions should cause a panic, assuming that the runtime validation after the most recent mutation succeeds).
I think this goes against the Rust goals in terms of performance. Good for safe code, of course, but usually Rust users like to have compile time safety to making runtime safety checks unnecessary.
Sure, these days I'm mostly working on a few compilers. Let's say I want to make a fixed-size SSA IR. Each instruction has an opcode and two operands (which are essentially pointers to other instructions). The IR is populated in one phase, and then lowered in the next. During lowering I run a few peephole and code motion optimizations on the IR, and then do regalloc + asm codegen. During that pass the IR is mutated and indices are invalidated/updated. The important thing is that this phase is extremely performance-critical.
One normal "trick" is phantom typing. You create a type representing indices and have a small, well-audited portion of unsafe code handling creation/unpacking, where the rest of the code is completely safe.
The details depend a lot on what you're doing and how you're doing it. Does the graph grow? Shrink? Do you have more than one? Do you care about programmer error types other than panic/UB?
Suppose, e.g., that your graph doesn't change sizes, you only have one, and you only care about panics/UB. Then you can get away with:
1. A dedicated index type, unique to that graph (shadow / strong-typedef / wrap / whatever), corresponding to whichever index type you're natively using to index nodes.
2. Some mechanism for generating such indices. E.g., during graph population phase you have a method which returns the next custom index or None if none exist. You generated the IR with those custom indexes, so you know (assuming that one critical function is correct) that they're able to appropriately index anywhere in your graph.
3. You have some unsafe code somewhere which blindly trusts those indices when you start actually indexing into your array(s) of node information. However, since the very existence of such an index is proof that you're allowed to access the data, that access is safe.
Techniques vary from language to language and depending on your exact goals. GhostCell [0] in Rust is one way of relegating literally all of the unsafe code to a well-vetted library, and it uses tagged types (via lifetimes), so you can also do away with the "only one graph" limitation. It's been awhile since I've looked at it, but resizes might also be safe pretty trivially (or might not be).
The general principle though is to structure your problem in such a way that a very small amount of code (so that you can more easily prove it correct) can provide promises that are enforceable purely via the type system (so that if the critical code is correct then so is everything else).
That's trivial by itself (e.g., just rely on option-returning .get operators), so the rest of the trick is to find a cheap place in your code which can provide stronger guarantees. For many problems, initialization is the perfect place (e.g., you can bounds-check on init and then not worry about it again) (e.g., if even bounds-checking on initialization is too slow then you can still use the opportunity at initialization to write out a proof of why some invariant holds and then blindly/unsafely assert it to be true, but you then immediately pack that hard-won information into a dedicated type so that the only place you ever have to think about it is on initialization).
I do use a combination of newtyped indices + singleton arenas for data structures that only grow (like the AST). But for the IR, being able to remove nodes from the graph is very important. So phantom typing wouldn't work in that case.
Usually you'd want to write almost all your slice or other container iterations with iterators, in a functional style.
For the 5% of cases that are too complex for standard iterators? I never bother justifying why my indexes are correct, but I don't see why not.
You very rarely need SAFETY comments in Rust because almost all the code you write is safe in the first place. The language also gives you the tool to avoid manual iteration (not just for safety, but because it lets the compiler eliminate bounds checks), so it would actually be quite viable to write these comments, since you only need them when you're doing something unusual.
I didn't restate the context from the code we're discussing: it must not panic. If you don't care if the code panics, then go ahead and unwrap/expect/index, because that conforms to your chosen error handling scheme. This is fine for lots of things like CLI tools or isolated subprocesses, and makes review a lot easier.
So: first, identify code that cannot be allowed to panic. Within that code, yes, in the rare case that you use [i], you need to at least try to justify why you think it'll be in bounds. But it would be better not to.
There are a couple of attempts at getting the compiler to prove that code can't panic (e.g., the no-panic crate).
What about memory allocation - how will you stop that from panicking ? `Vec::resize` will always panic in Rust. And this is just one example out of thousands in the Rust stdlib.
Unless the language addresses no-panic in its governing design or allows try-catch, not sure how you go about this.
That is slowly being addressed, but meanwhile it’s likely you have a reliable upper bound on how much heap your service needs, so it’s a much smaller worry. There are also techniques like up-front or static allocation if you want to make more certain.
This is ridiculous. We're probably going to start seeing more of these. This was just the first, big highly visible instance.
We should have a name for this similar to "my code just NPE'd". I suggest "unwrapped", as in, "My Rust app just unwrapped a present."
I think we should start advocating for the deprecation and eventual removal of the unwrap/expect family of methods. There's no reason engineers shouldn't be handling Options and Results gracefully, either passing the state to the caller or turning to a success or fail path. Not doing this is just laziness.
Indexing is comparatively rare given the existence of iterators, IMO. If your goal is to avoid any potential for panicking, I think you'd have a harder time with arithmetic overflow.
Your pair of posts is very interesting to me. Can you share with me: What is your programming environment such that you are "fine with allocation failures"? I'm not doubting you, but for me, if I am doing systems programming with C or C++, my program is doomed if a malloc fails! When I saw your post, I immediately thought: Am I doing it wrong? If I get a NULL back from malloc(), I just terminate with an error message.
I mean, yeah, if I am using a library, as an user of this library, I would like to be able to handle the error myself. Having the library decide to panic, for example, is the opposite of it.
If I can't allocate memory, I'm typically okay with the program terminating.
I don't want dependencies deciding to unwrap() or expect() some bullshit and that causing my entire program to crash because I didn't anticipate or handle the panic.
Code should be written, to the largest extent possible, to mitigate errors using Result<>. This is just laziness.
I want checks in the language to safeguard against lazy Rust developers. I don't want their code in my dependency tree, and I want static guarantees against this.
edit: I just searched unwrap() usage on Github, and I'm now kind of worried/angry:
Something that allows me to tag annotate a function (or my whole crate) as "no panic", and get a compile error if the function or anything it calls has a reachable panic.
This will allow it to work with many unmodified crates, as long as constant propagation can prove that any panics are unreachable. This approach will also allow crates to provide panicking and non panicking versions of their API (which many already do).
Yes, I want that. I also want to be able to (1) statically apply a badge on every crate that makes and meets these guarantees (including transitively with that crate's own dependencies) so I can search crates.io for stronger guarantees and (2) annotate my Cargo.toml to not import crates that violate this, so time isn't wasted compiling - we know it'll fail in advance.
On the subject of this, I want more ability to filter out crates in our Cargo.toml. Such as a max dependency depth. Or a frozen set of dependencies that is guaranteed not to change so audits are easier. (Obviously we could vendor the code in and be in charge of our own destiny, but this feels like something we can let crate authors police.)
I think the most common solution at the moment is dtolnay's no_panic [0]. That has a bunch of caveats, though, and the ergonomics leave something to be desired, so a first-party solution would probably be preferable.
I would be fine just getting rid of unwrap(), expect(), etc. That's still a net win.
Look at how many lazy cases of this there are in Rust code [1].
Some of these are no doubt tested (albeit impossible to statically guarantee), but a lot of it looks like sloppiness or not leaning on the language's strong error handling features.
It's disappointing to see. We've had so much of this creep into the language that eventually it caused a major stop-the-world outage. This is unlikely to be the last time we see it.
I don't write Rust so I don't really know, but from someone else's description here it sounds similar to `fromJust` in Haskell which is a common newbie footgun. I think you're right that this is a case of not using the language properly, though I know I was seduced into the idea that Haskell is safe by default when I was first learning, which isn't quite true — the safety features are opt-in.
A language DX feature I quite like is when dangerous things are labelled as such. IIRC, some examples of this are `accursedUnutterablePerformIO` in Haskell, and `DO_NOT_USE_OR_YOU_WILL_BE_FIRED_EXPERIMENTAL_CREATE_ROOT_CONTAINERS` in React.js.
I would be in favor of renaming unwrap() and its family to `unwrap_do_not_use_or_you_will_break_the_internet()`
I still think we should remove them outright or make production code fail to compile without a flag allowing them. And we also need tools to start cleaning up our dependency tree of this mess.
For iteration, yes. But there's other cases, like any time you have to deal with lots of linked data structures. If you need high performance, chances are that you'll have to use an index+arena strategy. They're also common in mathematical codebases.
Yes, I always thought it was wrong to use unwrap in examples. I know, people want to keep examples simple, but it trains developers to use unwrap() as they see that everywhere.
Yes, there are places where it's ok as that blog post explains so well: https://burntsushi.net/unwrap/
But most devs IMHO don't have the time to make the call correctly most of the time... so it's just better to do something better, like handle the error and try to recover, or if impossible, at least do `expect("damn it, how did this happen")`.
There is a prevailing mentality that LLMs make it easy to become productive in new languages, if you are already proficient in one. That's perhaps true until you suddenly bump up against the need to go beyond your superficial understanding of the new language and its idiosyncrasies. These little collisions with reality occur until one of them sparks an issue of this magnitude.
In theory, experienced human code reviewers can course correct newer LLM-guided devs work before it blows up. In practice, reviewers are already stretched thin and submitters absolute to now rapidly generate more and more code to review makes that exhaustion effect way worse. It becomes less likely they spot something small but obvious amongst the haystack of LLM generated code bailing there way.
> There is a prevailing mentality that LLMs make it easy to become productive in new languages, if you are already proficient in one.
Yes, and: I've found this to be mostly true, if you make sure you take the time to deeply understand what the code is doing. When I asked an LLM to do something for me in Javascript, then I said, "What if X happens, wouldn't that cause Y? Would it be better to restructure it like so and so to make it more robust?" The LLM immediately improves it.
Any experienced programmer who was taking the time to review this code, on learning that unwrap() has a "panic" inside, would certainly change it. But as you say, reviewers are already stretched thin.
Dunno, I think the alternatives have their own pretty significant downsides. All would require front loading more in-depth understanding of error handling and some would just be quite a bit more verbose.
IMO making unwrap a clippy lint (or perhaps a warning) would be a decent start. Or maybe renaming unwrap.
This strikes me as a culture issue more than one of language.
A tenet of systems code is that every possible error must be handled explicitly and exhaustively close to the point of occurrence. It doesn’t matter if it is Rust, C, etc. Knowing how to write systems code is unrelated to knowing a systems language. Rust is a systems language but most people coming into Rust have no systems code experience and are “holding it wrong”. It has been a recurring theme I’ve seen with Rust development in a systems context.
C is pretty broken as a language but one of the things going for it is that it has a strong systems code culture surrounding it that remembers e.g. why we do all of this extra error handling work. Rust really needs systems code practice to be more strongly visible in the culture around the language.
Unwrap _is_ explicitly handling an error at the point of occurrence. You have explicitly decided to panic, which is sometimes a valid choice. I use it (on startup only) when server configs are missing or invalid or in CLI tools when the options aren't valid. Crashing a pod on startup before it goes Ready is a valid pattern in k8s and generally won't cause an outage because the previous pod will continue working.
> at least do `expect("damn it, how did this happen")`
That gives you the same behavior as unwrap with a less useful error message though. In theory you can write useful messages, but in practice (and your example) expect is rarely better than unwrap in modern rust
I disagree with that characterization. Using unwrap() like you suggest in your blog post is an intentional, well-thought-out choice. Using unwrap() the way Cloudflare did it is, with hindsight, a bad choice, that doesn't utilize the language's design features.
Note that they're not criticizing the language. I read "Rust developers" in this context as developers using Rust, not those who develop the language and ecosystem. (In particular they were not criticizing you.)
I think it's reasonable to question the use of unwrap() in this context. Taking a cue from your blog post^ under runtime invariant violations, I don't think this use matches any of your cases. They assumed the size of a config file is small, it wasn't, so the internet crashed.
Echelon's comment was "We shouldn't be using unwrap() or expect() at all. [...] unwrap(), expect(), bad math, etc. - this is all caused by lazy Rust developers". Even in my most generous interpretation I can't see how that is anything except a rejection of all unwraps (and equivalent constructs like expect()).
I fully agree with burntsushi that echelon is taking an extreme and arguably wrong stance. His sentiment becomes more and more correct as Rust continues to evolve ways to avoid unwrap as an ergonomic shortcut, but I don't think we are quite there yet for general use. There absolutely is code that should never panic, but that involves tradeoffs and design choices that aren't true for every project (or even the majority of them)
> We shouldn't be using unwrap() or expect() at all.
So the context of their comment is not some specific nuanced example. They made a blanket statement.
> Note that they're not criticizing the language. I read "Rust developers" in this context as developers using Rust, not those who develop the language and ecosystem.
I have the same interpretation.
> I think it's reasonable to question the use of unwrap() in this context. Taking a cue from your blog post^ under runtime invariant violations, I don't think this use matches any of your cases. They assumed the size of a config file is small, it wasn't, so the internet crashed.
Yes? I didn't say it wasn't reasonable to question the use of unwrap() here. I don't think we really have enough information to know whether it was inappropriate or not.
unwrap() is all about nuance. I hope my blog post conveyed that. Because unwrap() is a manifestation of an assertion on a runtime invariant. A runtime invariant can be arbitrarily complicated. So saying things like, "we shouldn't be using unwrap() or expect() at all" is an extreme position to carve out that is also way too generalized.
I stand by what I said. They are factually mistaken in their characterization of the use of unwrap()/expect() in general.
> So the context of their comment is not some specific nuanced example. They made a blanket statement.
That is their opinion, I disagree with it, but I don't think it's an insulting or invalid opinion to have. There are codebases that ban nulls in other languages too.
> They are factually mistaken in their characterization of the use of unwrap()/expect() in general.
It's an opinion about a stylistic choice. I don't see what fact there is here that could be mistaken.
I'm finding this exchange frustrating, and now we're going in circles. I'll say this one last time in as clear language as I can. They said this:
> unwrap(), expect(), bad math, etc. - this is all caused by lazy Rust developers or Rust developers not utilizing the language's design features.
The factually incorrect part of this is the statement that use of `unwrap()`, `expect()` and so on is caused by X or Y, where X is "lazy Rust developers" and Y is "Rust developers not utilizing the language's design features." But there are, factually, other causes than X or Y for use of `unwrap()`, `expect()` and so on. So stating that it is all caused by X or Y is factually incorrect. Moreover, X is 100% insulting when applied to any one specific individual. Y can be insulting when applied to any one specific individual.
Now this:
> We shouldn't be using unwrap() or expect() at all.
That's an opinion. It isn't factually incorrect. And it isn't insulting.
I'm sorry I'm frustrating you. It was not my intention. For what it's worth, I use ripgrep every day, and it's made my life appreciably better. (Same goes for Astral products.) Thank you for that, and I wish your day improves.
> unwrap(), expect(), bad math, etc. - this is all caused by lazy Rust developers or Rust developers not utilizing the language's design features
I just read that line as shorthand for large outages caused by misuse of unwrap(), expect(), bad math etc. - all caused by...
That's also an opinion, by my reading.
I assumed we were talking specifically about misuses, not all uses of unwrap(), or all bad bugs. Anyway, I think we're ultimately saying the same thing. It's ironic in its own way.
I have to disagree that unwrap is ever OK. If you have to use unwrap, your types do not match your problem. Fix them. You have encoded invariants in your types that do not match reality.
Change your API boundary, surface the discrepancy between your requirements and the potential failing case at the edges where it can be handled.
If you need the value, you need to handle the case that it’s not available explicitly. You need to define your error path(s)
This is a failure caused by lazy Rust programming and not relying on the language's design features.
It's a shame this code can even be written. It is surprising and escapes the expected safety of the language.
I'm terrified of some dependency using unwrap() or expect() and crashing for something entirely outside of my control.
We should have an opt-in strict Cargo.toml declaration that forbids compilation of any crate that uses entirely preventable panics. The only panics I'll accept are those relating to memory allocation.
This is one of the sharpest edges in the language, and it needs to be smoothed away.
`slice[i]` is also a hole in the type system, but at least it’s generally relying on a local invariant, immediate to the surrounding context, that does not require lying about invariants across your API surface.
The blog post doesn’t address the issue, it simply pretends it’s not a real problem.
Also from the post: “If we were to steelman advocates in favor of this style of coding, then I think the argument is probably best limited to certain high reliability domains. I personally don’t have a ton of experience in said domains …”
`slice[i]` is just sugar for `slice.get(i).unwrap()`. And whether it's a "local" invariant or not is orthogonal. And `unwrap()` does not "require lying about invariants across your API surface."
> The blog post doesn’t address the issue, it simply pretends it’s not a real problem.
It very explicitly addresses it! It even gives real examples.
> Also from the post: “If we were to steelman advocates in favor of this style of coding, then I think the argument is probably best limited to certain high reliability domains. I personally don’t have a ton of experience in said domains …”
>
> Enough said.
Ad hominem... I don't have experience working on, e.g., medical devices upon which someone's life depends. So the point of that sentence is to say, "yes, I acknowledge this advice may not apply there." You also cherry picked that quote and left off the context, which is relevant here.
And note that you said:
> I have to disagree that unwrap is ever OK.
That's an extreme position. It isn't caveated to only apply to certain contexts.
> `slice[i]` is just sugar for `slice.get(i).unwrap()`. And whether it's a "local" invariant or not is orthogonal. And `unwrap()` does not "require lying about invariants across your API surface."
It's not orthogonal. `Result` isn't a local invariant, and yes, `.unwrap()` does require lying. If your code depends on an API that can fail, and you cannot handle that failure locally (`.unwrap()` is not handling it), then your type signature needs to express that you can fail -- and you need to raise an error on that failure.
> That's an extreme position. It isn't caveated to only apply to certain contexts.
No, it's a principled position. Correct code doesn't `.unwrap()`, but code that hides failure cases -- or foists invariant enforcement onto programmers remembering not to screw up -- does.
I've built and worked on ridiculously complex code bases without a single instance of `.unwrap()` or the local language equivalent; it's just not necessary. This is just liked the unchecked exception debate in Java -- complex explanations for a very simple goal of avoiding the thought, time, and effort to accurately model a system's invariants.
> No, it's a principled position. Correct code doesn't `.unwrap()`, but code that hides failure cases -- or foists invariant enforcement onto programmers remembering not to screw up -- does.
I don't think you understand what an internal runtime invariant is. Either way, I don't know of any widespread libraries (in any language) that follow this "principled" position. That makes it de facto extreme.
> I've built and worked on ridiculously complex code bases without a single instance of `.unwrap()` or the local language equivalent; it's just not necessary.
Show me. If you're using `slice[i]`, then you're using `unwrap()`. It introduces a panicking branch.
> If your code depends on an API that can fail, and you cannot handle that failure locally (`.unwrap()` is not handling it), then your type signature needs to express that you can fail -- and you need to raise an error on that failure.
You use `unwrap()` when you know the failure cannot happen.
I note you haven't engaged with any of the examples I provided in the blog.
> You use `unwrap()` when you know the failure cannot happen.
That’s an invariant meant to be expressed by your type system — and it is.
You’ve failed to model your invariants in your API — and thus the type system — if you ever reach a point where an engineer has to manually assess and assert whether “cannot” applies.
> If you have to use unwrap, your types do not match your problem
The problem starts with Rust stdlib. It panics on allocation failure. You expect Rust programmers to look at stdlib and not imitate it?
Sure, you can try to taboo unwrap(), but 1) it won't work, and 2) it'll contort program design in places where failure really is a logic bug, not a runtime failure, and for which unwrap() is actually appropriate.
The real solution is to go back in time, bonk the Rust designers over the head with a cluebat, and have them ship a language that makes error propagation the default and syntactically marks infallible cleanup paths --- like C++ with noexcept.
Of course it will. I've built enormous systems, including an entire compiler, without once relying on the local language equivalent of `.unwrap()`.
> 2) it'll contort program design in places where failure really is a logic bug, not a runtime failure, and for which unwrap() is actually appropriate.
That's a failure to model invariants in your API correctly.
> ... have them ship a language that makes error propagation the default and syntactically marks infallible cleanup paths --- like C++ with noexcept.
Unchecked exceptions aren't a solution. They're a way to avoid taking the thought, time, and effort to model failure paths, and instead leave that inherent unaddressed complexity until a runtime failure surprises users. Like just happened to Cloudflare.
It's the same blind spot people have to Java's checked exceptions. People commonly resort to Pokemon exception handling and either blindly ignoring or rethrowing as a runtime exception. When Rust got popular, I was a bit confused by people talking about how great Result it's essentially a checked exception without a stack trace.
"Checked Exceptions Are Actually Good" gang, rise up! :p
I think adoption would have played out very different if there had only been some more syntactic-sugar. For example, an easy syntax for saying: "In this method, any (checked) DeepException e that bubbles up should immediately be replaced by a new (checked) MylayerException(e) that contains the original one as a cause.
We might still get lazy programmers making systems where every damn thing goes into a generic MylayerException, but that mess would still be way easier to fix later than a hundred scattered RuntimeExceptions.
Exception handling would be better than what we're seeing here.
The problem is that any non-trivial software is composition, and encapsulation means most errors aren't recoverable.
We just need easy ways to propagate exceptions out to the appropriate reliability boundary, ie. the transaction/ request/ config loading, and fail it sensibly, with an easily diagnosable message and without crashing the whole process.
C# or unchecked Java exceptions are actually fairly close to ideal for this.
The correct paradigm is "prefer throw to catch" -- requiring devs to check every ret-val just created thousands of opportunities for mistakes to be made.
By contrast, a reliable C# or Java version might have just 3 catch clauses and handle errors arising below sensibly without any developer effort.
I'm with you! Checked exceptions are actually good and the hate for them is super short sighted. The exact same criticisms levied at checked exceptions apply to static typing in general, but people acknowledge the great value static types have for preventing errors at compile time. Checked exceptions have that same value, but are dunked on for some reason.
1. in most cases they don't want to handle `InterruptedException` or `IOException` and yet need to bubble them up. In that case the code is very verbose.
2. it makes lambdas and functions incompatible. So eg: if you're passing a function to forEach, you're forced to wrap it in runtime exception.
3. Due to (1) and (2), most people become lazy and do `throws Exception` which negates most advantages of having exceptions in the first place.
In line-of-business apps (where Java is used the most), an uncaught exception is not a big deal. It will bubble up and gets handled somewhere far up the stack (eg: the server logger) without disrupting other parts of the application. This reduces the utility of having every function throw InterruptedException / IOException when those hardly ever happen.
Java checked exceptions suffer from a lack of generic exception types ("throws T", where T can be e.g. "Exception", "Exception1|Exception2", or "never") This would also require union types and a bottom type.
Without generics, higher order functions are very hard to use.
In my experience, it actually is a big deal, leaving a wake of indeterminant state behind after stack unrolling. The app then fails with heisenbugs later, raising more exceptions that get ignored, compounding the problem.
People just shrug off that unreliability as an unavoidable cost of doing business.
Yeah, in both cases it's a layering situation, where it's the duty of your code to decide what layers of abstraction need to be be bridged, and to execute on that decision. Translating/wrapping exception-types from deeper functions is the same as translating/wrapping return-types the same places.
I think it comes down to a psychological or use-case issue: People hate thinking about errors and handling them, because it's that hard stuff that always consumes more time than we'd like to think. Not just digitally, but in physical machines too. It's also easier to put off "for later."
Checked exceptions in theory were good, but Java simply did not add facilities to handle or support them well in many APIs. Even the new API's in Java - Streams, etc do not support checked exceptions.
There is also the problem that they decided to make all references nullable, so `NullPointerException`s could appear everywhere. This "forced" them to introduce the escape hatch of `RuntimeException`, which of course was way overused immediately, normalizing it.
It's a lot lighter: a stack trace takes a lot of overhead to generate; a result has no overhead for a failure. The overhead (panic) only comes once the failure can't be handled. (Most books on Java/C# don't explain that throwing exceptions has high performance overhead.)
Exceptions force a panic on all errors, which is why they're supposed to be used in "exceptional" situations. To avoid exceptions when an error is expected, (eof, broken socket, file not found,) you either have to use an unnatural return type or accept the performance penalty of the panic that happens when you "throw."
In Rust, the stack trace happens at panic (unwrap), which is when the error isn't handled. IE, it's not when the file isn't found, it's when the error isn't handled.
Exceptions do not force panic at all. In most practical situations, an exception unhandled close to where it was thrown will eventually get logged. It's kind of a "local" panic, if you will, that will terminate the specific function, but the rest of the program will remain unaffected. For example, a web server might throw an exception while processing a specific HTTP request, but other HTTP requests are unaffected.
Throwing an exception does not necessarily mean that your program is suddenly in an unsupported state, and therefore does not require terminating the entire program.
> Throwing an exception does not necessarily mean that your program is suddenly in an unsupported state, and therefore does not require terminating the entire program.
That's not what a panic means. Take a read through Go's panic / resume mechanism; it's similar to exceptions, but the semantics (with multiple return values) make it clear that panic is for exceptional situations. (IE, panic isn't for "file not found," but instead it's for when code isn't written to handle "file not found.")
Sure, but the same is true of any error handling strategy.
When you work with exceptions, the key is to assume that every line can throw unless proven otherwise, which in practice means almost all lines of code can throw. Once you adopt that mental model, things get easier.
Explicit error handling strategies allow you to not worry about all the code paths that explicitly cannot throw -- which is a lot of them. It makes life a lot easier in the non-throwing case, and doesn't complicate life any more in the throwing case as compared to exception-based error handling.
It also makes errors part of the API contract, which is where they belong, because they are.
It can and that optimization has existed for a while.
Actually it can also just turn off the collection of stack traces entirely for throw sites that are being hit all the time. But most Java code doesn't need this because code only throws exceptions for exceptional situations.
> it's essentially a checked exception without a stack trace
In theory, theory and practice are the same. In practice...
You can't throw a checked exception in a stream, this fact actually underlines the key difference between an exception and a Result: Result is in return position and exceptions are a sort of side effect that has its own control flow. Because of that, once your method throws an Exception or you are writing code in a try block that catches an exception, you become blind to further exceptions of that type, even if you might be able to or required to fix those errors. Results are required to be handled individually and you get syntactic sugar to easily back propagate.
It is trivial to include a stack trace, but stack traces are really only useful for identifying where something occurred, and generally what is superior is attaching context as you back propagate which trivially occurs with judicious use of custom error types with From impls. Doing this means that the error message uniquely defines the origin and paths it passed through without intermediate unimportant stack noise. With exceptions you would always need to catch each exception and rethrow a new exception containing the old to add contextual information, then to avoid catching to much you need variables that will be initialized inside the try block defined outside of the try block. So stack traces are basically only useful when you are doing Pokemon exception handling.
> When Rust got popular, I was a bit confused by people talking about how great Result it's essentially a checked exception without a stack trace.
It's not a checked exception without a stack trace.
Rust doesn't have Java's checked or unchecked exception semantics at the moment. Panics are more like Java's Errors (e.g. OOM error). Results are just error codes on steroids.
checked exceptions failed because when used properly they fossilize method signatures. they're fine if your code will never be changed and they're fine when you control 100% of users of the throwing code. if you're distributing a library... no bueno.
That’s just not true. They required that you use hierarchical exception types and define your own library exception type that you declare at the boundary.
The same is required for any principled error handling.
That's kind of what I'm saying with the blind spot comment. The words "unwrap" and "expect" should be just as much a scary red flag as the word "panic", but for some reason it seems a lot of people don't see them that way.
Even in lowly Java, they later added to Optional the orElseThrow() method since the name of the get() method did not connote the impact of unwrapping an empty Optional.
I've found both methods very useful. I'm using `get()` when I've checked that the value is present and I don't expect any exceptions. I'm using `orElseThrow()` when I actually expect that value can be absent and throwing is fine. Something like
if (userOpt.isPresent()) {
var user = userOpt.get();
var accountOpt = accountRepository.selectAccountOpt(user.getId());
var account = accountOpt.orElseThrow();
}
Idea checks it by default and highlights if I've used `get()` without previous check. It's not forced at compiler level, but it's good enough for me.
The `unsafe` keyword means something specific in Rust, and panicking isn't unsafe by Rust's definition. Sometimes avoiding partial functions just isn't feasible, and an unwrap (or whatever you want to call the method) is a way of providing a (runtime-checked) proof to the compiler that the function is actually total.
unwrap() should effectively work as a Result<> where the user must manually invoke a panic in the failure branch. Make special syntax if a match and panic is too much boilerplate.
This is like an implicit null pointer exception that cannot be statically guarded against.
I want a way to statically block any crates doing this from my dependency chain.
That would require an effects system[0] like Koka's[1]. Then one could not only express the absence of panics but also allocations, infinite loops and various other undesirable effects within some call-trees.
This is a desirable feature, but an enormous undertaking.
Same thing that would happen if it did a match statement and panicked. The problem is the panic, not the unwrap.
I don’t think you can ever completely eliminate panics, because there are always going to be some assumptions in code that will be surprisingly violated, because bugs exist. What if the heap allocator discovers the heap is corrupted? What if you reference memory that’s paged out and the disk is offline? (That one’s probably not turned into a panic, but it’s the same principle.)
Not sure what you're saying with the "work as a Result<>" part...unwrap is a method on Result. I think you're just saying the unwrap/expect methods should be eliminated?
Than they are going to write None | Err => yolo() that has the same impact. It is not the syntax or the semantic meaning is the problem here but the fact that there is no monitoring around the elevated error counts after a deployment.
Software engineers tend to get stuck in software problems and thinking that everything should be fixed in code. In reality there are many things outside of the code that you can do to operate unreliable components safely.
Exactly. People are very hung up on "unwrap" but even if it wasn't there at all, you will have devs just manually writing the match. Or, even more likely, using a trivial 'unwrap!" macro.
There's also an assumption here that if the unwrap wasn't there, the caller would have handled the error properly. But if this isn't part of some common library at CF, then chances are the caller is the same person who wrote the panicking function in the first place. So if a new error variant they introduced was returned they'd probably still abort the thread either by panicking at that point or breaking out of the thread's processing loop.
Really not! This is a huge faceplant for writing things in Rust. If they had been writing their code in Java/Kotlin instead of Rust, this outage either wouldn't have happened at all (a failure to load a new config would have been caught by a defensive exception handler), or would have been resolved in minutes instead of hours.
The most useful thing exceptions give you is not static compile time checking, it's the stack trace, error message, causal chain and ability to catch errors at the right level of abstraction. Rust's panics give you none of that.
Look at the error message Cloudflare's engineers were faced with:
thread fl2_worker_thread panicked: called Result::unwrap() on an Err value
That's useless, barely better than "segmentation fault". No wonder it took so long to track down what was happening.
A proxy stack written in a managed language with exceptions would have given an error message like this:
com.cloudflare.proxy.botfeatures.TooManyFeaturesException: 200 > 60
at com.cloudflare.proxy.botfeatures.FeatureLoader(FeatureLoader.java:123)
at ...
and so on. It'd have been immediately apparent what went wrong. The bad configs could have been rolled back in minutes instead of hours.
In the past I've been able to diagnose production problems based on stack traces so many times I was been expecting an outage like this ever since the trend away from providing exceptions in new languages in the 2010s. A decade ago I wrote a defense of the feature and I hope we can now have a proper discussion about adding exceptions back to languages that need them (primarily Go and Rust):
That has nothing to do with exceptions, just the ability to unwind the stack. Rust can certainly give you a backtrace on panics; you don’t even have to write a handler to get it. I would find it hard to believe Cloudflare’s services aren’t configured to do it. I suspect they just didn’t put the entire message in the post.
tldr: Capturing a backtrace can be a quite expensive runtime operation, so the environment variables allow either forcibly disabling this runtime performance hit or allow selectively enabling it in some programs.
It's one of the problems with using result types. You don't distinguish between genuinely exceptional events and things that are expected to happen often on hot paths, so the runtime doesn't know how much data to collect.
panic is the exceptional event. It so happens that rust doesn't print a stacktrace in release unless configured to do so.
Similarly, capturing a stack trace in a error type (within a Result for example) is perfectly possible. But this is a choice left to the programmer, because capturing a trace is not cheap.
There's clearly a big gap in how things are done in practice. You wouldn't see anyone call System.exit in a managed language if a data file was bigger than expected. You'd always get an exception.
I used to be an SRE at Google. Back then we also had big outages caused by bad data files pushed to prod. It's a common enough issue so I really sympathize with Cloudflare, it's not nice to be on call for issues like that. But Google's prod environments always generated stack traces for every kind of failure, including CHECK failures (panics) in C++. You could also reflect the stack traces of every thread via HTTP. I used to diagnose bugs in production under time pressure quite regularly using just these tools. You always need detailed diagnostics.
Languages shouldn't have panics, tbh, it's a primitive concept. It so rarely makes sense to handle errors that way. I know there's a whole body of Rust/Go lore claiming panics are fine, but it's not a good move and is one of the reasons I've stayed away from Go over the years and wouldn't use Rust for anything higher than low level embedded components or operating system code that has to export a C ABI. You always want diagnostics and recoverable errors; this kind of micro-optimization doesn't make sense outside of extremely constrained embedded environments that very few of us work in.
An uncaught exception in C++ or an uncaught panic in Rust terminates the program. The unwinding is the same mechanism. I think the implementation is what comes with LLVM, but I haven't checked.
I was also a Google SRE, and I liked the stacktrace facilities so much that I got permission to open source a library inspired from it: https://github.com/bombela/backward-cpp (I know I am not doing a great job maintaining it)
At Uber I implemented a similar stackrace introspection for RPC tasks via HTTP for Go services.
You can also catch a Go panic. Which we did in our RPC library at Uber.
It would be great for all of that to somehow come ready made though. A sort of flag "this program is a service, turn on all the good diagnostics, here is my main loop".
Alternatively you can look at actually innovative programming languages to peek at the next 20 years of innovation.
I am not sure that watching the trendy forefront successfully reach the 1990s and discuss how unwrapping Option is potentially dangerous really warm my heart. I can’t wait for the complete meltdown when they discover effect systems in 2040.
To be more serious, this kind of incident is yet another reminder that software development remains miles away from proper engineering and even key providers like Cloudfare utterly fail at proper risk management.
Celebrating because there is now one popular language using static analysis for memory safety feels to me like being happy we now teach people to swim before a transatlantic boat crossing while we refuse to actually install life boats.
To me the situation has barely changed. The industry has been refusing to put in place strong reliability practices for decades, keeps significantly under investing in tools mitigating errors outside of a few fields where safety was already taken seriously before software was a thing and keeps hiding behind the excuse that we need to move fast and safety is too complex and costly while regulation remains extremely lenient.
I mean this Cloudfare outage probably cost millions of dollars of damage in aggregate between lost revenue and lost productivity. How much of that will they actually have to pay?
Let's try to make effect systems happen quicker than that.
> I mean this Cloudfare outage probably cost millions of dollars of damage in aggregate between lost revenue and lost productivity. How much of that will they actually have to pay?
Probably nothing, because most paying customers of cloudflare are probably signing away their rights to sue Cloudflare for damages by being down for a while when they purchase Cloudflare's services (maybe some customers have SLAs with monetary values attached, I dunno). I honestly have a hard time suggesting that those customers are individually wrong to do so - Cloudflare isn't down that often, and whatever amount it cost any individual customer by being down today might be more than offset by the DDOS protection they're buying.
Anyway if you want Cloudflare regulated to prevent this, name the specific regulations you want to see. Should it be illegal under US law to use `unwrap` in Rust code? Should it be illegal for any single internet services company to have more than X number of customers? A lot of the internet also breaks when AWS goes down because many people like to use AWS, so maybe they should be included in this regulatory framework too.
> I honestly have a hard time suggesting that those customers are individually wrong to do so - Cloudflare isn't down that often, and whatever amount it cost any individual customer by being down today might be more than offset by the DDOS protection they're buying.
We have collectively agreed to a world where software service providers have no incentive to be reliable as they are shielded from the consequences of their mistakes and somehow we see it as acceptable that software have a ton of issues and defects. The side effect is that research on actually lowering the cost of safety has little return on investment. It doesn't have be so.
> Anyway if you want Cloudflare regulated to prevent this, name the specific regulations you want to see.
I want software provider to be liable for the damage they cause and minimum quality regulation on par with an actual engineering discipline. I have always been astounded that nearly all software licences start with extremely broad limitation of liability provisions and people somehow feel fine with it. Try to extend that to any other product you regularly use in your life and see how that makes you fell.
How to do proper testing, formal methods and resilient design have been known for decades. I would personnaly be more than okay with let's move less fast and stop breaking things.
> I want software provider to be liable for the damage they cause and minimum quality regulation on par with an actual engineering discipline. I have always been astounded that nearly all software licences start with extremely broad limitation of liability provisions and people somehow feel fine with it. Try to extend that to any other product you regularly use in your life and see how that makes you fell.
So do you want to make it illegal to punish GNU GPL licensed software because that license has a warranty disclaimer? Do you want to make it illegal for a company like Cloudflare to use open source licensed software with similar warranty disclaimers, or for the SLA agreements and penalties for violating them that they make with their own paying customers to be legally unenforceable? What if I just have a personal website and I break the javascript on it because I was careless, how should that be legally treated?
I'm not against research into more reliable software or using better engineering techniques that result in more reliable software. What I'm concerned about is the regulatory regime - in other words, what software it is or is not legal to write or sell for money - and how to properly incentivize software service providers to use techniques that result in more reliable software without causing a bunch of bad second order effects.
You can't go out in the middle of your city, build a shoddy bridge, say you wave all responsibilities and then wash your hands with the consequences when it predictably breaks. Why can you do that with pieces of software?
Limiting the scope of liability waivers is not the same things as censoring what software can be produced. It's just ensuring that everyone actually take responsibility for the things they distribute.
As I said previously, the current situation doesn't make sense to me. People have been brainwashed in believing that the way software is released currently, half finished and crippled with bugs, is somehow normal and acceptable. It absolutely doesn't have to be this way.
It'a beyond shameful that the average developers today is blissfully unaware of anything related to producing actually secure pieces of software. I am pretty sure I can walk into more than 90% of development shops today and no one there will know what formal methods are. With some luck, they might have some static analysers running, probably from a random provider and be happy with the crappy percentages that it outputs.
It's not about research. It's about a field which entirely refuses to become mature despite being pivotal to the modern economy. And why would it? Software products somehow get a free pass for the shit they push on everyone.
We are in the classical "market for lemons" trap where negative externalities are not priced in and investing in security will just get you to lose against companies that don't care. Every major incidents remind us we need out. The market has already showed it won't self correct. It's a classical case where regulatory intervention is necessary and legitimate.
The shift is already happening by the way. The EU product liability directive was adopted in 2024 and the transition period ends in December 2026. The US "National Cybersecurity Strategy" signals intend to review the status quo. It's coming faster that people realise.
I find myself in the odd position of agreeing with you both.
That we’re even having this discussion is a major step forward. That we’re still having this discussion is a depressing testament to how slow slowly the mainstream has adopted better ideas.
> I can’t wait for the complete meltdown when they discover effect systems in 2040
Zig is undergoing this meltdown. Shame it's not memory safe. You can only get so far in developing programming wisdom before Eternal September kicks in and we're back to re-learning all the lessons of history as punishment for the youthful hubris that plagues this profession.
It's not about whether you should ban unwrap() in production. You shouldn't. Some errors are logic bugs beyond which a program can't reasonably continue. The problem is that the language makes it too easy for junior developers (and AI!) to ignore non-logic-bug problems with unwrap().
Programmers early in their careers will do practically anything to avoid having to think about errors and they get angry when you tell them about it.
> In production code an unwrap or expect should be reviewed exactly like a panic.
An unwrap should never make it to production IMHO. It's fine while prototyping, but once the project gets closer to production it's necessary to just grep `uncheck` in your code and replace those that can happen with a proper error management and replace those that cannot happen with `expect`, with a clear justification of why they cannot happen unless there's a bug somewhere else.
I would say, sure, if you feel the same way about panic calls making to production. In other words, review all of them the same way. Because writing unwrap/expect is exactly the same as writing “if error, panic”.
I don't understand your point: panic! is akin to expect: you think about it consciously, use it explicitly and you write down a panic message explaining its rational.
It should be. If you aren’t treating it exactly the same as panic and expect, that’s what I’m calling the “blind spot”. And why should you have to make up a message every time when the backtrace is going to tell you what was wrong?
> And why should you have to make up a message every time when the backtrace is going to tell you what was wrong?
The message isn't really here to be displayed during a crash (since the crash should never happen in the first place), it's here to communicate the invariant in the code, to the developer reading and modifying it later on.
It's not necessarily invalid to use unwrap in production code if you would just call panic anyway. But just like every unsafe block needs a SAFETY comment, every unwrap in production code needs an INFALLIBILITY comment. clippy::unwrap_used can enforce this.