The author suggests that the text following the definition of "undefined behavior", listing the permitted or possible range of undefined behavior, should be read to restrict the consequences.
But the first possibility listed is "ignoring the situation completely with unpredictable results". Surely that covers any possible consequences.
The author also says:
> Returning a pointer to indeterminate value data, surely a “use”, is not undefined behavior because the standard mandates that malloc will do that.
Returning a pointer to data is not a use of that data. The fact that its value is indeterminate isn't relevant until you attempt to read it (without first writing it).
It may be worthwhile to reduce the number of constructs whose behavior is undefined, making them implementation-defined or unspecified instead. For example, if signed integer overflow yielded an unspecified result rather than causing undefined behavior, I wonder if any implementations would be adversely affected. (But it would remove the possibility of aborting a program that computes INT_MAX+1.)
I don't think reinterpreting "undefined behavior" as anything other than "the Standard imposes no requirements" is practical. If a program writes through a dangling pointer and, for example, clobbers a function's return address, what constraints could be imposed on what the program might do next?
> For example, if signed integer overflow yielded an unspecified result rather than causing undefined behavior, I wonder if any implementations would be adversely affected.
I suspect so - makes it harder to reason about loop counts because the compiler can't necessarily guarantee that an incremented loop counter won't become negative and thus the loop needs to iterate more.
E.g. something like for (int i=param; i < param + 16; i++) has a guaranteed loop count with the current rules, but not with yours?
That's not an excuse for but having any way to do proper overflowing operations on signed integers though.
That's the exact reason why this rule was introduced into the standard: it was so C compilers could compete with Fortran compilers (Fortran has similar rules and at the time they were beating C compilers on equivalent scientific codes by 2-3x).
Fortran has even more restrictive aliasing rules than C: a function is allowed to assume that any two array arguments passed as arguments do not overlap. If they do, the behavior is undefined.
Exactly - it was done for meaningless benchmarking reasons. C programmers would be happy to use "restrict" as an opt-in for those, but this argument about FORTRAN goes back to the initial days of the standard when Dennis Ritchie had to push "noalias" out of the proposed standard.
> I suspect so - makes it harder to reason about loop counts because the compiler can't necessarily guarantee that an incremented loop counter won't become negative and thus the loop needs to iterate more.
This is a favourite example that gets thrown around, but for all practical loops GCC and clang seem to have no problem even when you compile with -fwrapv
I don't know if there exists a C compiler that leverages this feature but there are ISAs (for instance MIPS) that can trap on signed overflow.
The fact that it's UB in C means that you can tell the compiler to generate these exception-generating instructions, which could make some overflow bugs easier to track down without any performance implications. And your compiler would still be 100% compliant with the standard.
That being said I just tried and at least by default GCC emits the non-trapping "ADDU" even for signed adds, so maybe nobody actually uses that feature in practice.
That doesn't really help with the compiler optimization aspect : A typical use of the range information would be to unroll the loop - in which case there's no addition to trap on anymore.
To be fair, if you want to make sure that loop is unrolled even in the presence of -fwrapv, writing it as for (int i=0; i < 16; i++) {/* use i+param */} is a very simple change for you to make even today. You'll have to make much uglier changes to code if you're at the level of optimization where loop unrolling really matters for your code on a modern processor.
GCC is optimised for performing well on benchmarks at the expense of anything else. Vendor compilers for those architectures traditionally had more programmer-friendly features like trapping instead of creating an exploitable security vulnerability.
> GCC is optimised for performing well on benchmarks at the expense of anything else.
This is very wrong, and I don't know why you would come to this conclusion.
> Vendor compilers for those architectures traditionally had more programmer-friendly features like trapping instead of creating an exploitable security vulnerability.
Assuming you defined signed integer overflow to follow two’s complement rules (the only reasonable interpretation other than UB), it would still be a guaranteed loop count of 16. (EDIT: i’m a dumbass, this is obvs not true. disregard this paragraph)
There’s an interesting thing to note with that example though: even if you did make signed integer overflow defined, that code is still obviously incorrect if param + 16 overflows. Like, the fact that signed integer overflow is UB is totally fine in this example: making it defined behavior doesn’t fix the code, and if making it UB allows the compiler to optimize, then why not?
Arguably, this is the case with the vast majority of signed integer overflow examples: the UB isn’t really the issue, the issue is that the programmer didn’t consider overflow, and if overflow happens the code is incorrect regardless. Why cripple the compilers ability to optimize to protect cases which are almost certainly incorrect anyway?
The real problem is in a better world 'int' would be replaced by types that actually exhibit the correct behavior.
for a loop counter you want an index type that will seg fault on overflow. If you think not having that check is worth it the programmer would need to tag it with unsafe.
It's also problematic because it's size is defined as at least 16 bits. But programmers which means you should never use it to store a constant larger than 16 bits. But people do that all the time.
I’m not sure I agree. If signed overflow is UB, loops like this can be optimized the hell out of. The most obvious way would be to unroll it and eliminate the loop (and loop variable) entirely, but you can also do things like vectorize it, maybe turn it in to just a small number of SIMD instructions. The performance gains are potentially enormous if this is in a hot loop.
With your magic int that traps on overflow, you couldn’t do that if the compiler was forced to rely on that behaviour. This is exactly why signed overflow is UB in C, and I don’t think that’s an unreasonable case for a language like C.
To be clear, my point is that this program is incorrect if overflow happens regardless of whether overflow is UB or not. So you might as well make it UB and optimize the hell out of it.
The broader argument is that signedness of the integer type used for indexing is a non-obvious gotcha affecting vectorizability. It makes sense once you understand C integer semantics, but putting on a language designer hat, I'd go with something more explicit.
does not have a guaranteed loop count with the current rules. The loop body will execute 16 times if param <= INT_MAX-16, but if the expression "param + 16" can overflow, the behavior is undefined. (I'm assuming param is of type int.)
> does not have a guaranteed loop count with the current rules. The loop body will execute 16 times if param <= INT_MAX-16, but if the expression "param + 16" can overflow, the behavior is undefined. (I'm assuming param is of type int.)
And the standard permits us (among other responses) to ignore undefined behaviour, so it does have a guaranteed loop count under a reading of the standard which the standard specifically and explicitly allows.
No, the standard permits the implementation to ignore the behavior "with unpredictable results".
If the value of param is INT_MAX, the behavior of evaluating param + 16 is undefined. It doesn't become defined behavior because a particular implementation makes a particular choice. And the implementation doesn't have to tell you what choice it makes.
What the standard means by "ignoring the situation completely" is that the implementation doesn't have to be aware that the behavior is undefined. In this particular case:
for (int i=param; i < param + 16; i++)
that means the compiler can assume there's no overflow and generate code that always executes the loop body exactly 16 times, or it can generate naive code that computes param + 16 and uses whatever result the hardware gives it. And the implementation is under no obligation to tell you how it decides that.
> that means the compiler can assume there's no overflow and generate code that always executes the loop body exactly 16 times
Right. That's what I said.
And just to be super-precise about the wording, the standard doesn't say "ignore the behavior 'with unpredictable results'" it says "Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results". Nitpicky, but the former wording could be taken to imply that ignoring behavior is only permissible if the behavior is unpredictable, when what the standard actually says is that you can ignore the behavior, even if the results of ignoring it are unpredictable.
And my point is that as far as the language is concerned, there is no guaranteed loop count under any circumstances. (An implementation is allowed, but not required, to define the behavior for that implementation.)
The two of you are not disagreeing except insofar as you're both using the word "guaranteed" to mean completely different things. _kst_, you're using it to mean "the programmer can rely on it". msbarnett, you're using it to mean "the compiler can rely on it".
> If the value of param is INT_MAX, the behavior of evaluating param + 16 is undefined. It doesn't become defined behavior because a particular implementation makes a particular choice. And the implementation doesn't have to tell you what choice it makes.
The compiler writer argument is as follows:
The program is either UB (when param is INT-MAX - 15 higher) or has exactly 16 iterations. Since we are free to give any semantics to a UB program, it is standard-compliant to always execute 16 times regardless of param's value.
in which case the overflow will cause the loop to change some random memory, but its ok since removing a single instruction test that is easy to pipeline is worth incorrect results!
Either the limit on param is guaranteed in some way by the rest of the program, or it is not. If it is, then the loop count is guaranteed in both cases. If it is not, the loop count is not guaranteed in either case.
You are mistaken, the C standard is quite clear that it does not make any guarantees regarding the behavior of programs that exhibit undefined behavior, and that signed integer overflow is undefined behavior.
"for (int i=param; i < param + 16; i++) does not have a guaranteed loop count in the presence of undefined behavior" is true, but it's equally true that the C standard is quite clear that undefined behavior can be ignored, so we can validly treat "for (int i=param; i < param + 16; i++)" as if it were guaranteed to loop 16 times in all cases.
No, the C standard doesn't say that "undefined behavior can be ignored" (which would mean what, making it defined?).
It says, "NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, ...".
It doesn't say that the behavior can be ignored. It says that the undefinedness can be ignored. The implementation doesn't have to take notice of the fact that the behavior is undefined.
Let's take a simpler example:
printf("%d\n", INT_MAX + 1);
The behavior is undefined. The standard does not guarantee anything about it. A conforming implementation can reject it at compile time, or it can generate code that crashes, or it can generate code that emits an ADD instruction and print whatever the hardware returns, or it can play roge at compile time. (The traditional joke is that can make demons fly out of your nose. Of course it can't, but an implementation that did so would be physically impossible, not non-conforming.)
An implementation might define the behavior, but it's still "undefined behavior" as that term is defined by the ISO C standard.
"undefined behavior can be ignored" (meaning: the case where this could overflow need not be considered and can be treated as though it does not exist) vs "The implementation doesn't have to take notice of the fact that the behavior is undefined" strikes me as a distinction without a difference given that we land in exactly the same spot: the standard allows us to treat "for (int i=param; i < param + 16; i++)" as if it were guaranteed to loop 16 times in all cases.
> An implementation might define the behavior, but it's still "undefined behavior" as that term is defined by the ISO C standard.
The point where we seem to disagree (and the pedantry here is getting tiresome so I don't know that there's any value in continuing to go back and forth of on it) is that yes, it's undefined behavior by the ISO C standard. BUT, the ISO C standard also defines the allowable interpretations of and responses to undefined behaviour. Those responses don't exist "outside" the standard – they flow directly from it.
So it's simultaneously true that the standard does not define it and that the standard gives us a framework in which to give its undefinedness some treatment and response, even if that response is "launch angband" or, in this case, "act as if it loops 16 times in all cases".
Of course an implementation can do anything it likes, including defining the behavior. That's one of the infinitely many ways of handling it -- precisely because it's undefined behavior.
I'm not using "undefined behavior" as the English two-word phrase. I'm using the technical term as it's defined by the ISO C standard. "The construct has undefined behavior" and "this implementation defines the behavior of the construct" are not contradictory statements.
And "ignoring the situation completely" does not imply any particular behavior. You seemed to be suggesting that "ignoring the situation completely" would result in the loop iterating exactly 16 tyimes.
> Of course an implementation can do anything it likes, including defining the behavior. That's one of the infinitely many ways of handling it -- precisely because it's undefined behavior.
An implementation can do whatever it likes within the proscribed bounds the standard provides for reacting to "undefined behavior", and conversely whatever the implementation chooses to do within those bounds is consistent with the standard.
Which, again, is the entire point of this: "the loop iterates exactly 16 times" is a standards-conforming interpretation of the code in question. There's nothing outside the standard or non-standard about that. That is, in fact, exactly what the standard says that it is allowed to mean.
> I'm not using "undefined behavior" as the English two-word phrase. I'm using the technical term as it's defined by the ISO C standard.
So am I. Unlike you, I'm merely taking into account the part of the standard that says "NOTE: Possible undefined behavior ranges from ignoring the situation completely with unpredictable results..." and acknowledging that things that do so are standards-conforming.
> You seemed to be suggesting that "ignoring the situation completely" would result in the loop iterating exactly 16 tyimes.
I'm merely reiterating what the standard says: that the case in which the loop guard overflows can be ignored, allowing an implementation to conclude that the loop iterates exactly sixteen times in all scenarios it is required to consider.
All you seem to be doing here is reiterating, over and over again, "the standard says the behavior of the loop is undefined" to argue that the loop has no meaning, while ignoring that a different page of the same standard actual gives an allowable range of meanings to what it means for "behavior to be undefined", and that therefore anyone of those meanings is, in fact, precisely within the bounds of the standard.
We can validly say that the standard says "for (int i=param; i < param + 16; i++)" means "iterate 16 times always". We can validly say that the standard says "for (int i=param; i < param + 16; i++)" means "launch angband when param + 16 exceeds MAX_INT". Both are true statements.
> the standard allows us to treat "for (int i=param; i < param + 16; i++)" as if it were guaranteed to loop 16 times in all cases.
The standard allows this, but the standard also allows iterating less than 16 times, or turning it into an infinite loop, or doing things that a programmer can’t actually do intentionally inside the language’s rules. Undefined means “nothing is defined.” It doesn’t mean “nothing is defined, but in an intuitive way.”
They're not mistaken. What compilers will do is assume that UB don't happen. If no UB happens, that means `param + 16` never overflowed, therefore there are always exactly 16 operations.
Or they assume "param + 16" will never overflow, so they emit an ADD instruction and use whatever result it yields.
Saying that a compiler "assumes" anything is anthropomorphic. A compiler may behave (generate code) in a manner that does not take the presence or absence of undefined behavior into account. If you just say it assumes something, that doesn't tell you what it will do based on that assumption.
Generating code that yields exactly 16 iterations is one of infinitely many possible consequences of undefined behavior.
If the mathematical value of `param + 16` exceeds `INT_MAX`, then the code has undefined behavior. The C standard says nothing at all about how the program will behave. A conforming compiler can generate code that iterates 42 times and then whistles Dixie. The non-normative note under the definition of "undefined behavior" does not constrain what a conforming implementation is allowed to do.
"imposes no requirements" means "imposes no requirements".
Perhaps there's an implicit quantifier here: "for all valid implementations of the C standard, the loop count is guaranteed to be 16" versus "there exists a valid implementation of the C standard in which...".
(This line of thought inspired by RankNTypes, "who chooses the type", etc.)
Perhaps there's an implicit quantifier here: "for all valid implementations of the C standard, the loop count is guaranteed to be 16" versus "there exists a valid implementation of the C standard in which...".
That's precisely my point? Because the overflow case is undefined, the compiler can assume it doesn't happen and optimize based on the fixed loop count.
The overflow case is not UB. param can be unsigned, of fwrapv may be declared. Or the compiler chooses to declare fwrapv by default. In no case is the compiler allowed to declare the overflow away, unless it knows from before that param can not overflow. The optimization on loop count 16 can still happen with a runtime guard.
The loop counter is signed even if param is not, so i++ could overflow. fwrapv is a compiler flag, it is not part of the standard: it is a flag that mandates a certain behaviour in this case, but in standard C, the loop variable overflowing is definitely UB. No runtime guard needed, C compilers are just allowed to assume a fixed length. This is the whole reason signed overflow is UB in C, for exactly cases like this.
If param is unsigned, then "param + 16" cannot overflow; rather, the value wraps around in a language-defined manner. I've been assuming that param is of type int (and I stated that assumption).
It’s not useless.
The assumption is not false if the program doesn’t have undefined behavior.
The assumption allows the code to be a few times faster.
To disallow this assumption would inhibit these optimizations.
> For example, if signed integer overflow yielded an unspecified result rather than causing undefined behavior, I wonder if any implementations would be adversely affected.
You don't need to wonder. You can use -fwrapv to make signed integer overflow defined behavior.
C++20 introduced the guarantee that signed integers are two's complement. The original version of that proprosal also defined the behavior on overflow; but that part was rejected (signed integer overflow remains UB):
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p090...
So at least the committee seems to think that the performance advantages are worth it.
> For example, if signed integer overflow yielded an unspecified result rather than causing undefined behavior, I wonder if any implementations would be adversely affected.
Yes.
There are several architectures where signed integer overflow traps, just like division by 0 on x86. (which is why division by 0 is UB) If a C compiler for those architectures was required to yield an unspecified result instead of trapping, every time the code performed a signed integer addition/subtraction, it would need to update a trap handler before and afterward to return an unspecified value instead of invoking the normal trap handler.
The author suggests that the text following the definition of "undefined behavior", listing the permitted or possible range of undefined behavior, should be read to restrict the consequences.
But the first possibility listed is "ignoring the situation completely with unpredictable results". Surely that covers any possible consequences.
Absolutely not. In the C89 standard, undefined behavior becomes undefined *UPON USE OF* the thing that is undefined. In current compilers, the existence of undefined behavior anywhere in your program is an excuse to do anything that the compiler wants to with all of the rest of your program. Even if the undefined behavior is never executed. Even if the undefined behavior happens after the code that you have encountered.
So, for example, undefined behavior that can be encountered within a loop makes it allowable to simply remove the loop. Even if the undefined behavior is inside of an if that does not happen to evaluate to true with your inputs.
This is actually desired though, at least by some programs. For example, say you have a function with a very expensive loop that repeatedly performs a null check and then executes some extra code if it's null, but never sets the value. This is called from another function which uses the checked value without a null check (proving it's not null) before and after the loop ends. The first function is inlined. You want to tell the compiler not to optimize out the null check and extra code in the loop? Or that it can't optimize stuff out to reuse the value from the first use of the value? If so, what is the compiler allowed to optimize out or reorder?
Now, to see why this might actually produce a bug in working code--say some other thread has access to the not-null value and sets it racily (non-atomically) to null. Or (since most compilers are super conservative about checks of values that escape a function because they can't do proper alias analysis), some code accidentally buffer overflows and updates the pointer to null while intending to do something else. Suddenly, this obvious optimization becomes invalid!
Arguments to the effect of "the compiler shouldn't optimize out that loop due to assuming absence of undefined behavior" are basically arguments for compilers to leave tons of performance on the table, due to the fact that sometimes C programs don't follow the standard (e.g. forgetting to use atomics, or indexing out of bounds). While it's a legitimate argument, I don't think people would be too happy to find their C programs losing to Java in benchmarks on -O3, either.
There may be programs that desire such behavior. But I've never intentionally written one. Which is why I personally avoid C, and wish that I didn't have to work in environments coded in C.
I seriously would accept everything running at half speed for the certainty of not being subject to the problems of C level bugs. But as Rust grows in popularity, it looks like I won't need to worry about that.
Well, any code that triggers undefined behavior is already buggy by definition. I think it would be a lot more fruitful if, instead of blaming compilers for doing their job (trying to optimize code in a language that allows all sorts of potentially unsafe behavior), people enumerated the specific UB they had issues with. For example, a lot of people don't consider integer overflow, too-large bitshift, nonterminating loops, type punning without union, or "benign" data races automatic bugs in themselves. Some people don't even consider a null pointer dereference an automatic bug (but what about a null pointer field access, or array index that happens to land on a non-null page? Is the compiler allowed to optimize field accesses to pointer arithmetic, or not?).
Anyway this is all fine, but as you can imagine you lose a lot of optimizations that are facilitated by all that UB, so the compiler authors should then counter with some way to signal that you want the original undefined semantics (for instance, references in C++ and restrict pointers in C), or provide compile-time checking to prevent misuse that messes up optimizations (e.g. Rust's Send+Sync for avoiding data races, or UnsafeCell for signaling lack of restrict semantics / raw pointers for lack of non-nullability).
> So, for example, undefined behavior that can be encountered within a loop makes it allowable to simply remove the loop. Even if the undefined behavior is inside of an if that does not happen to evaluate to true with your inputs.
The last sentence is not true. If there is UB inside the if, the compiler may assume that the if condition never evaluates to true (and hence delete that branch of the if), but it may certainly not remove the surrounding loop (unless it can also prove that the condition must be true).
> In current compilers, the existence of undefined behavior anywhere in your program is an excuse to do anything that the compiler wants to with all of the rest of your program. Even if the undefined behavior is never executed.
This is…complicated. Let's say you have an array of ten numbers, and then you take user input and use that to index into the array. This program is well-formed…as long as the user never inputs a number beyond ten. If they do, then the program is invalid. In general, the presence of undefined behavior is an attribute of the running program, not the source code itself. If there exists any execution where only defined behavior, the compiler may not deviate from the standard. However, what you probably meant is behavior in the face of the existence of runtime undefined behavior, in which case you are correct that a compiler could write clairvoyant code that refuses to execute the first instruction if it knows that UB will happen at some point in the program.
I don't have the C11 standard. But that part of the passage remained unchanged in C99.
In C89 there was a list of PERMISSIBLE things that compilers could do upon encountering undefined behavior. In C99 that was changed to a list of POSSIBLE things. And compilers have taken full advantage of that.
Ah. That sounds like the argument made in "One Word Broke C" [0, 1]. I can't say I agree with that argument, though.
As pointed out here and in the HN comments on that article, the phrase "ignoring the situation completely with unpredictable results" is present in both those standards, and is arguably what allows aggressive compiler optimizations to be made, since to a first approximation those optimizations rely on ignoring control flow that encounters UB.
e.g. removing a check for for overflow is definitely NOT ignoring the behavior. Deleting write because it would be undefined behavior for a pointer to point at some location is also NOT ignoring the behavior. Ignoring the behavior is exactly what the rationale is describing when it says UB allows compilers to not detect certain kinds of errors.
Returning a pointer is certainly a use. In any event, the prevailing interpretation makes it impossible to write a defined memory allocator in C.
If a program writes through a dangling pointer and clobbers a return address, the programmer made an error and unpredictable results follow. C is inherently memory unsafe. No UB based labrynth of optimizations can change that. It is not designed to be memory safe: it has other design goals.
> e.g. removing a check for for overflow is definitely NOT ignoring the behavior. Deleting write because it would be undefined behavior for a pointer to point at some location is also NOT ignoring the behavior.
Depending on how you look at it, this is ignoring the behavior.
For example, say you have this:
int f(int a) {
if (a + 1 < a) {
// Handle error
}
// Do work
}
You have 2 situations:
1. a + 1 overflows
2. a + 1 does not overflow
Situation 1 contains undefined behavior. If the compiler decides to "ignor[e] the situation completely", then Situation 1 can be dropped from consideration, leaving Situation 2. Since this is the only situation left, the compiler can then deduce that the condition is always false, and a later dead code elimination pass would result in the removal of the error handling code.
So the compiler is ignoring the behavior, but makes the decision to do so by not ignoring the behavior. It's slightly convoluted, but not unreasonable.
More than slightly convoluted. The obvious intention is that the compiler ignores overflow and lets the processor architecture make the decision. Assuming that overflow doesn't happen is assuming something false. There's no excuse for that and it doesn't "optimize" anything.
> The obvious intention is that the compiler ignores overflow and lets the processor architecture make the decision.
If that were the case, wouldn't signed overflow be implementation-defined or unspecified behavior, instead of undefined behavior?
> Assuming that overflow doesn't happen is assuming something false.
It's "false" in the same way that assuming two restrict pointers don't alias is "false". It may not be universally true for every single program and/or execution, but the compiler is explicitly allowed to disregard cases where the assumption may not hold (i.e., the compiler is allowed to "ignor[e] the situation completely").
And again, the compiler is allowed to make this assumption because undefined behavior has no defined semantics. If the compiler assumes that no undefined behavior occurs, and undefined behavior does occur, whatever happens at that point is still conforming, since the Standard says that it imposes no requirements on said program.
> it doesn't "optimize" anything.
...But it does allow for optimizations? For example, assuming signed overflow can allow the compiler to unroll/vectorize loops when the loop index is not the size of a machine word [0]. Godbolt example at [1].
> If that were the case, wouldn't signed overflow be implementation-defined or unspecified behavior, instead of undefined behavior?
No, because (among other reasons) the processor architecture might decide to trap or not trap depending the run-time values of configuration registers that the compiler doesn't know and can't control or document.
> the processor architecture might decide to trap or not trap depending the run-time values of configuration registers that the compiler doesn't know and can't control
I'm not certain that that would fall outside implementation-defined behavior. Would something like "Program behavior on overflow is determined by processor model and configuration" not work?
> or document.
And even if the behavior couldn't be documented, that could be covered by unspecified behavior (assuming the language in the C standard is the same as in the C++ standard in this case)
> Would something like "Program behavior on overflow is determined by processor model and configuration" not work?
Not sure; if nothing else, that seems like it would allow the implementation to avoid documenting any implementation-defined behaviour with a blanket "all implementation-defined behaviour is whatever the hardware happens to do when executing the relevant code".
I mean, that works? It's not great by any means, but it at least eliminates the ability to make the assumptions underlying more aggressive optimizations, which seems like it'd address one of the bigger concerns around said optimizations.
Perhaps I should have phrased it as "all implementation-defined behaviour is whatever the hardware happens to do when executing whatever code the compiler happens to generate".
The point of implementation-defined behaviour is that the implementation should be required to actually define the behaviour. Whereas undefined behaviour doesn't impose any requirements; the implementation can do whatever seems reasonable on a given hardware architechure. That doesn't mean that backdoor-injection malware pretending to be a implementation is a conforming implementation.
> Perhaps I should have phrased it as "all implementation-defined behaviour is whatever the hardware happens to do when executing whatever code the compiler happens to generate".
Even with this definition, the important part is that compilers would no longer be able to ignore control flow paths that invoke undefined behavior. Signed integer overflow/null pointer dereference/etc. may be documented to produce arbitrary results, and that documentation may be so vague as to be useless, but those overflow/null pointer checks are staying put.
Err, that's not a definition, that's a example of pathologically useless 'documentation' that a perverse implementation might provide if it were allowed to 'define' implementation-defined behaviour by deferring to the hardware. Deferring to the hardware is what undefined behaviour is, the point of implementation-defined behaviour is to be less vague than that.
> may be documented to produce arbitrary results, and that documentation may be so vague as to be useless, but those overflow/null pointer checks are staying put. [emphasis added]
Yes, exactly; that is what undefined behaviour is. That is what "the standard imposes no requirements" means.
> Deferring to the hardware is what undefined behaviour is
If that were the case, the Standard would say so. The entire reason people argue over this in the first place is because the Standard's definition of undefined behavior allows for multiple interpretations.
In any case, you're still missing the point. It doesn't matter how good or bad the documentation of implementation-defined behavior may or may not be; the important part is that compilers cannot optimize under the assumption that control flow paths containing implementation-defined behavior are never reached. Null-pointer checks, overflow checks, etc. would remain in place.
> Yes, exactly; that is what undefined behaviour is. That is what "the standard imposes no requirements" means.
I think you're mixing standardese-undefined-behavior with colloquial-undefined-behavior here. For example, if reading an uninitialized variable were implementation-defined behavior, and an implementation said the result of reading an uninitialized variable was "whatever the hardware returns", you're going to get some arbitrary value/number, but your program is still going to be well-defined in the eyes of the Standard.
When I said implementation-defined, I meant implementation-defined. This is because the applicability of UB-based optimization to implementation-defined behavior - namely, the lack thereof - is wholly uncontroversial. Thus, the diversion into the quality of documentation-defined behavior is not directly relevant here; the mere act of changing something from undefined behavior to implementation-defined behavior neatly renders irrelevant any argument about whether any particular UB-based optimization is valid.
> Compilers cannot assume that, because (in the general case) it is not true.
This is not necessarily true. For example, consider the semantics of the restrict keyword. The guarantees promised by a restrict-qualified pointer aren't true in the general case, but preventing optimizations because of that rather defeats the entire purpose of restricting a pointer in the first place.
More generally, the entire discussion about UB-based optimizations exists precisely because the Standard permits a reading such that compilers can make optimizations that don't hold true in the general case, precisely because the Standard imposes no requirements on programs that violate those assumptions.
> I think the author of that blog was correct: the preferred path is for the compiler to provide data to the programmer to simplify the loop.
Requiring the equivalent of PGO is a rather unfortunate bar, though to be fair if you're that interested in performance it's probably something worth looking into anyways.
I'm curious how how noisy an always-on warning for undersized loop variables would be, or how much code would have broken if int were changed to 64 bits on 64-bit platforms...
> For your godbolt example, use the C compiler not c++
Sorry; that was a mistake on my end. The same phenomenon occurs when compiling in C mode, in any case [0].
How so? The implementation can, and perhaps should, define that it errors. Whatever behaviour you're worried about a compiler doing for implementation-defined behaviour, it could do exactly the same thing if the behaviour was undefined.
Implementation defined behavior can only ever produce compiler warnings, which you can choose to be commit blockers if you want. But if a compiler can prove that UB can happen then it can completely prevent you from building that program.
> But if a compiler can prove that UB can happen then it can completely prevent you from building that program.
Not really; the C standard requires implementations to have particular behaviour for executions which do not encounter undefined behaviour, so an implementation still has to do the right thing for valid cases. So if there's even one possible set of user input etc. for which the program has defined behaviour then a compiler has to produce an executable.
> UB means the compiler will trust me and concentrate on generate the fastest code ever.
In reality, UB means the compiler will assume it doesn't happen and work from there.
Of course a more expressive language could just make it so the compiler doesn't have to assume this e.g. a C compiler will consider a dereference as meaning the pointer is non-null, both backwards and forwards.
But if the language had non-null pointers, it would not need to bother with that, it would have a non-null pointer in the first place. It could still optimise nullable pointers (aka lower nullable pointers to non-nullable if they're provably non-nullable, usually after a few rounds of inlining), but that would be a much lower priority.
Expecting programmers to evaluate their own cleverness does not work. Every nontrivial C program has undefined behaviour, making it a security flaw waiting to happen - I've been in these kind of debates where a C advocate will claim that program X is correct, and literally every time it turns out that program X has undefined behaviour somewhere.
It's not so much about cleverness, but knowledge and vigilance. You first have to be aware of all the footguns, and then be careful not to let any of them slip through...
Then use such a tool, but don't call it C, rather -std=gnuc-opt11, which always knows better than the author, without any warning.
Call it randomC, unsuitable for professional programmers, but extremly suitable for benchmark games and managers. Who prefer to ignore pesty overflows, underflows, memset, memcpy, dereferencing NULL pointers and other rare cases.
But the first possibility listed is "ignoring the situation completely with unpredictable results". Surely that covers any possible consequences.
The author also says:
> Returning a pointer to indeterminate value data, surely a “use”, is not undefined behavior because the standard mandates that malloc will do that.
Returning a pointer to data is not a use of that data. The fact that its value is indeterminate isn't relevant until you attempt to read it (without first writing it).
It may be worthwhile to reduce the number of constructs whose behavior is undefined, making them implementation-defined or unspecified instead. For example, if signed integer overflow yielded an unspecified result rather than causing undefined behavior, I wonder if any implementations would be adversely affected. (But it would remove the possibility of aborting a program that computes INT_MAX+1.)
I don't think reinterpreting "undefined behavior" as anything other than "the Standard imposes no requirements" is practical. If a program writes through a dangling pointer and, for example, clobbers a function's return address, what constraints could be imposed on what the program might do next?