It is interesting to read how the compiler behaves when using instructions but the number of generated instructions is only one metric. As always, there is nothing better than measuring the runtime of your actual implementation.
I'm writing this because I had a tight-running loop that returned a bool and in 1 out of 100k or so cases, it would return false. It turns out that instead of checking that bool in each loop iteration, it was actually significantly faster to check nothing and throw an exception in that very rare case. Something like this:
This might not be the most C++-like code but it saved me like 50% of runtime in one case. Afaik as long as exceptions are not thrown, there is zero overhead, which would explain this. And, the call to eof might have been too complex and long-running, maybe I could have optimized that one instead. But there are abstractions that might hinder such optimizations.
You are making two functions calls in the second loop, vs one in the first. If the result of read() could be checked for eof, then you would remove one call. Also, since the branch is so easily predicted, I would be surprised if there is much of a performance hit once the second call is removed.
If you see a 50% speed-up in the above code, I question what read is doing and the implementation of eof(). I would expect read() to dominate the if statement unless eof is extremely expensive. I have written dozens of read()/eof() functions over the years and read() always dominates because the read involves copying memory and/or making OS calls, etc., while eof() is a simple comparison. So if read() dominates runtime, example 1 and 2 above should not be anywhere close to 50% different.
That particular code came from a bit reader class. So, if you are reading only a few bits per call, then the read call becomes very cheap. It might be as "simple" as this pseudocode
And yes, I think the eof call might have been too complex especially in contrast to the short read call. It probably should only check a flag that might be set by the read call automatically.
At the very least you are redundantly executing logic without the exception. The check for eof has to be done implicitly anyway inside read because it has to fill the bit buffer with data from the byte buffer or the byte buffer with data from the file. And if both fail, then we already know the result of eof, so no need to duplicate checking for eof in the outer read calling loop.
Here is the full commit with ad-hoc benchmark results in the commit message:
In this commit, I even replace std::optional with an exception. But, I didn't document benchmark results and the commit message begins with "try" so they probably weren't significantly better but also not worse or else I wouldn't commit it:
Wihtout seeing the assembly it is difficult to say but for such a large difference, the culprit often seems to the loop iop cache. If you check the alignment of the top of the loop instructions (and it inlined away the calls) sometimes you can see an alignment change for some trivial bullshit that has vasat performance impacts because it causes a series of other missed optmizations.
That damn loop cache is great for performance when you hit it,but sucks horrible when you don't even think out it and spend days trying to figure out some performance quandry.
In the exception code a foo.eof() check need to be in foo.read().
So I guess the runtime in theory should be the same and the implementation happened to be slow. Maybe eol() did some syscall each time instead of caching a EOF int like in C.
An underrated topic which I haven't seen discussed much is what you should actually do in the real world when you encounter certain exceptions. Too many examples just kinda gloss over that by saying "oh you should handle any errors or exceptions here".
If my memory allocation fails or if some file-related operation fails, what are the chances that I can actually do anything about it? Sometimes the best thing to do is just let it crash and log the event.
It would be great to see a book about common strategies for handling exceptions and other forms of failure in real world applications. I'd imagine that certain strategies for handling exceptions would also require you to write your application in such a way as to make it possible to recover from failures as well.
If something goes wrong, there is not that much you can do. AFAIK the options are 1) retry 2) fall back to something else 3) give up and complain to someone. The options are simple, even if none of them are particularly nice.
There are a lot of variations on these, of course: log it, and to where? keep unfinished work around somewhere or crash and burn? what if even logging the failure fails?
A guy I once worked with had a really good approach to exceptions: don't.
Basically if your code should throw, instead of throwing. Do something about it. Rollback the filesystem or transaction so things can retry or the process can exit.
Ideally, do not rely on someone reading a log message.
Instead, do what that person reading the log would need to do.
If that's an escalation to support staff or the CEO. Send the email. If the developer reading the log would just press a retry button, press the button for them.
Basically, giving up was just not an option, it worked pretty well tbh, sure we had logs and they were crucial for debugging failures but weekend callouts were not a thing. We never had to babysit processes. Never had to worry about filesystem corruption if you want to kill -9 a process mid run.
Sure that works for application code; for library code, you are raising an exception so that the application code can decide the domain-specific thing to do that the person reading the log message would do. Exceptions are raised by code that can be used in multiple contexts, where the person reading the log message would be doing entirely different things.
I agree with part of this. When possible, code should either succeed or revert back to the state before the operation, before doing a throw. Avoid half-completed work if at all possible. This is in general quite easy by first doing work with higher chance on failure, and only then connecting that work to the rest of the system state. E.g. don't add an item to a list and then do something with it, do it the other way round and only add items to the list when something has already been done on them.
If you are in a known failure scenario, you can try known resolutions. If some necessary resource like a DB or file system disappeared, try to self-heal as soon as the resource reappears. But if you're in unknown territory, better stop working and complain. Writing a message in a log is good enough, IF some monitoring system will pick up that log and alert the right channel.
I've seen systems trying to self heal, but working on wrong assumptions. Operators now have 2 problems: Fix the system AND stop it from damaging itself. Self-healing can turn to self-damaging quickly.
I’ve written a few systems that were aggressively self-healing and operated them in production. The benefit is as you say. When done well, the systems kind of run themselves and require much less attention than systems that are not designed this way. From an operations perspective it was great. From a software development perspective, not so much, and this largely explains why it is uncommon.
In all typical software architectures, many places in the code do not have enough context to handle exceptional conditions. Single errors may have multiple possible root causes that have to be determined by inference or deduction in the code so that the handling is appropriate. Evaluating some causes requires complex code far outside the purview of the software’s main purpose and possibly skill set of the developers. Appropriate resolution of an exception at a single call site can be context dependent — not only do you have to determine the root cause at runtime, you also have to determine the correct resolution at runtime. A single resolution may need to implement multiple strategies to take into account real-time environmental context that change how that resolution is handled.
Making this logic maintainable requires an architecture that pretty heavily revolves around the software infrastructure required to make this type of exception handling scalable. You’re replacing all of the error handling idioms every software engineer knows with something alien that colors the entire code base. Also, there is little in the way of robust frameworks that do a lot of this grunt work for you so you are usually left writing your own.
The tl;dr is that implementation is quite expensive and difficult in practice, even though it usually has no performance overhead and is great from an operations perspective. While people like the idea, the software development overhead is usually considered too high to justify making operations’ life easier.
I've certainly had success with an application with a very aggressive transactional approach; every button press would commit-or-crash, and in the event that it did crash you could simply resume from another terminal by logging in with a quick key fob tap.
Exceptions only make sense if there are failures to handle. If you don't use exceptions for failure, something else like an 'if not ok goto handle' needs to be in there. This is what C does explicitly, and rust implicitly if you look trough Result.
So a better comparison would be: let construct() return a bool, an do manual error handling if it returns false. I'm pretty sure, the compiler will generate much worse code in this case.
Yes, I didn't originally know that (I thought constructors and exceptions would have been co-implemented and didn't consider that faillible ctors would have been outright impossible for a decade), and found out after pjc50's reply: https://news.ycombinator.com/item?id=33572348
People used to use a flag in objects to say 'invalid because constructor failed', or make the constructor infallible, and have a separated initialise method which could return an error code.
I mean .. yes, but easy multiple-value-return unpacking is also a fairly new feature. Back in the 90s you had one (1) lvalue. Making the "return (thing, error)" pattern miserable to use.
I was going to say that such an alternative history C++ could have had MRVs earlier, or even sum types... but apparently exceptions were added a decade after constructors (if we draw that back to the original "C with classes").
So yeah, you're right that constructors led to exceptions, unless the designers of C++ had been willing to go back to the drawing board and entirely redo constructors.
Indeed. It doesn't make sense to compare code designed to handle errors with code that simply ignores them - yet I see it all the time when people talk about C++ exception handling.
Is this refuting a claim that is actually made? Do people claim exceptions never have overhead on the happy path in C++?
As an analogy, forget exceptions for a moment—just imagine plain function calls in C++ vs. Python. In Python, function calls always have overhead. In C++, though, we say function calls don't add overhead, because function calls can be inlined. But when we make that claim, we obviously don't mean function calls never have overhead in C++—obviously, the statement can't apply if you can't inline the call. Rather, the meaning of the claim is that function calls don't inherently introduce overhead—i.e. if they do, it's because of some other fact (like inability to inline), not the mere existence of the function call itself.
The case seems similar to me here. When people say exceptions don't slow down the happy path, they don't mean this is impossible. They just mean that if this does occur, it's due to additional inherent obstacles beyond the mere enablement of exceptions (e.g., opaque function calls).
A great post. Note that the post is specific to C++ and does not apply to, say, Java. Java is effectively always paying the cost of exception handling, so you might as well use it.
A better way to put it: java doesn't have destructors—mostly because it has proper gc, so it doesn't have to spend time piddling around and freeing objects—so you never have to pay most of the costs that c++ pays.
Freeing heap objects in C++ is an expensive operation. You may want to look into how that is implemented in the common allocators.
In Java, an allocation is literally a single instruction (adding a value to the heap pointer), and freeing is zero cost (since you just drop the pointer after you're done with it).
By contrast, in C++ both allocation and freeing of heap memory are costly operations (which is there is so much emphasis on stack allocation).
As it turns out, the time taken by the garbage collector in Java (which doesn't exist in C++) is still going to be less than what is spent on managing the same number of heap objects in C++.
The only reason why C++ can be faster than the Java memory management is thanks to all the stack allocation.
> In Java, an allocation is literally a single instruction (adding a value to the heap pointer), and freeing is zero cost (since you just drop the pointer after you're done with it).
There are other hidden costs in Java GC allocation. While most of the time, an allocation will be just two instructions (it cannot just increment a heap pointer, it also has to check whether there's enough free space and jump to a slow path if there isn't, which needs a second instruction), there is also some overhead in the GC tracking of modified references (card marking), either a short pause in each thread while the GC reads its registers (to get any pointers stored there) or extra overhead of having to store references in the stack instead of just in registers, and probably some missed optimization opportunities due to having to do things in a order the GC expects.
The garbage collector itself has some annoying overheads: it can waste cache and memory bandwidth by scanning cold objects, it can cause unnecessary swapping in of swapped out pages, and most importantly, it uses more memory. In fact, that's the main trick of GC: it trades off allocation/deallocation speed and memory usage.
> The only reason why C++ can be faster than the Java memory management is thanks to all the stack allocation.
No, the main reason C++ can be faster than the Java memory management is that it has value objects, which can avoid a lot of heap allocation (and pointer chasing) even without replacing it with stack allocation. Consider for instance an array of objects, in C++ it can be a single allocation, while Java requires one allocation for each object, plus another allocation for the array itself. Which can also be considered a hidden cost of Java GC allocation: to be able to track objects correctly, the GC requires that references to an object point to the beginning of its allocation, so an object cannot be inlined within another and still be tracked correctly by the Java GC.
By Java 5 the average cost of an allocation in Java was already lower than heap allocation in C++, but greater than the cost of stack allocation. With escape analysis in Java 8 that got even better.
The biggest problem Java had was that 90’s era thinking on GC algorithms (most GC languages in the 90’s were using 80’s era GC algorithms) ran out of steam sometime around 2005, when memory availability created situations where GC pause times were impossible to hide. Cliff Click created a JVM that actually ran as a VM, in order to create a cheaper mark phase via page faulting to implement mark on write. Now we have other options.
Pauseless [0]? It originally used hardware support on Azul machines to trap upon reading an unmarked pointer, but now (in the C4 generational incarnation, as well as in ZGC) this is done in software. But the other collectors use snapshot-at-the-beginning and a write barrier for concurrent marking, which originated in 1990 [1].
And, if one wants to be picky, Pauseless is a variation of Henry Baker's "real-time" copying collector from 1978 [2], although Pauseless uses tags corresponding to various interesting states (marked/unmarked, forwarded/unforwarded) rather than just from-space/to-space.
So, I went to replicate the bottom case, and particularly want to do it more "fairly" by actually isolating the resulting code rather than staring at a ton of complex intermediate assembly :/. I'm thereby building a real binary--after carefully separating the compilation stages using -c and avoiding any linker optimization--instead of -S and then using objdump -d and then--because this is the whole point of talking about the happy path--I'm only providing the code of the actual functions and not any subsequent exception landing pads.
Now, first off, I hope people can see that this is a much more fair comparison. I am not sure why the author of this article wanted to use -S, but it seems more than a bit disingenuous to be complaining about how "tidy" the happy path is after pasting a million assembler directives at us as if those actually have a runtime performance cost instead of actually looking at the resulting execution paths through the assembly that results.
(And I want to be very clear here: this isn't because I somehow got a different result. If you carefully read through the assembly presented in the article and mentally skip all the assembler directives, you will see they got something similar, with a call to the constructor followed by three calls to destruct immediately one after the other. That said, I am not sure if the author actually understood what they were looking at, as they didn't provide the code for __ZN1AILj5EvEC2Ev, which is clearly part of the happy path.)
But... yeah: it added the cost of a single extra call indirection for some reason. I'm not entirely sure why, but this frankly doesn't feel like a ridiculous cost. I will also note that it actually generates identical code to the case without exceptions if you merely add a single "inline" keyword to the code on the constructor for A (note: I mean the one on line 9 that is ostensibly empty, not the one on line 16). That said, you also probably wouldn't actually WRITE such a no-op constructor in the first place, and it turns out deleting it also fixes the resulting code to be exactly the same with and without exceptions, so I'm not sure of the applicability here... this just feels like such a corner case :/.
(The actual reasonable complaint to make about "zero cost" exceptions--which this article didn't make, because it seems that it didn't occur to the author to do an actual timing analysis, which is where I was hoping this article would go--is that the exception tables tend to get mixed in with the rest of the normal functions, which massively pads out the code space, which will lead to many more TLB misses and page fetches while executing the happy path. There should be a way to move these landing pads all to the end of the executable to be loaded into memory only if absolutely necessary; but, if that's already possible, I personally don't know how to do that :(.)
In an ELF binary the exception tables are not intermingled with executable code but are coalesced into the .eh_frame section loaded in a separate read-only non-executable segment. The happy path is truly just the happy path standing alone.
Sorry, I don't know all of the correct terminology. Is it the "landing pads"? Here is the code that is generated for main (after removing that constructor so it ends up without he extra call), and this is all in the text segment... and this definitely isn't "the happy path standing alone".
The issue isn't the non-executable code: it is that it generates a lot more executable code that it intermixes with all of the rest of the code next to the code that relies on it, which is great for the performance of quickly being able to jump into the exceptional case but heavily dilutes the code cache and implies many more page faults and TLB misses.
The happy path is 401160 through 401185. The unwinding code no more dilutes the code cache than any other subsequent and possibly unrelated functions in memory.
The unwinding code is not there to make it faster since it's generally reached after walking the stack twice. It's moved out of the happy path to make the happy path faster since it doesn't dilute the instruction cache and doesn't hit the pipeline.
There is much more than the superficial code analysis. Especially on Windows, there is a whole framework of how exceptions are handled etc.. To me, exceptions are super expensive, we should avoid if we can.
The OS should normally have no involvement in C++ exceptions, and that certainly isn't the case on any operating system except Windows, where you can request that exceptions are bridged with structured exception handling, but I honestly don't believe that is common anymore, and even if you do opt into that (which I will argue you should probably never do, though I've sometimes been shocked that people still give a shit about the MSVC ABI :/), I also do not believe it is going to add a direct runtime cost to the happy path on 64-but Windows (which is all that we are analyzing here anyway, as the legacy ABIs usually used for 32-bit Intel systems definitely had a non-zero cost) as I am quite confident it is also implemented with tables.
For a while C++ exceptions on MSVC were implemented on top of the OS SEH. People still have scars about catch(...) catching segmentation faults. This has changed more than a decade ago though and has never been the case for 64bit code.
I think the move to FP cements my view that exception handling is more trouble than its worth. Golang is good too that it doesn't use exceptions except in the extreme. Java with checked exceptions was kinda fashionable for a while but is now unloved. Down with exceptions!
I haven’t been watching the benchmarks lately but for quite some time OCaml was notable for being nearly as fast as C. No other function friendly language was even close.
I don't see how wrapping everything in Try is significantly different than wrapping things in catch blocks at the end of the day. I see more people complaining about checked exceptions than I actually see them being used anyways.
> Let us now assume that we hide the function definitions:
extern void construct(),
destruct();
> The generated code in that case becomes noticeably longer
This was hard for me to follow, but I think that when the function definitions are removed, the compiler no longer knows that the constructor can't throw an exception, so now it has to add exception handling code?
Even in the first, "most optimised" case, the compiler is still generating more instructions than necessary. It should only need to generate one mov and one ret.
Usermode exception handling performs no system calls. Perhaps you're thinking of user handling of system exceptions on Windows, which are pretty much unrelated to C++ exceptions?
I'm writing this because I had a tight-running loop that returned a bool and in 1 out of 100k or so cases, it would return false. It turns out that instead of checking that bool in each loop iteration, it was actually significantly faster to check nothing and throw an exception in that very rare case. Something like this:
instead of this: This might not be the most C++-like code but it saved me like 50% of runtime in one case. Afaik as long as exceptions are not thrown, there is zero overhead, which would explain this. And, the call to eof might have been too complex and long-running, maybe I could have optimized that one instead. But there are abstractions that might hinder such optimizations.