I would do what the standard tells me to do, which is to *ignore the undefined b...

cokernel_hacker · on Sept 5, 2024

So let's change it up a bit.

  typedef int (*pfn)(void);
  int g(void);
  int h(void);

  pfn f(double x) {
    switch ((long long)x) {
    case 0:
      return g;
    case 17:
      return h;
    }
  }

If I understand your perspective correctly, `f` should return whatever happens to be in rax if the caller does not pass in a number which truncates to 0 or 17?

torstenvl · on Sept 5, 2024

More or less, yes.

I quibble with "should return" because I don't think it's accurate to say it "should" do anything in any specific set of circumstances. In fact, I'm saying the opposite: it should generate the generic, semantic code translation of what is actually written in source, and if it happens to "return whatever happens to be in rax" (as is likely on x64), then so be it.

In my view, that's what "ignoring the situation completely with unpredictable results" means.

nlewycky · on Sept 5, 2024

Why isn't that fine? The compiler ignored the undefined behavior it didn't detect.

torstenvl · on Sept 5, 2024

No. No honest person can claim that making a decision predicated on the existence of X is the same as "ignoring" X.

nlewycky · on Sept 5, 2024

This is the most normal case though, isn't it? Suppose a very simple compiler, one that sees a function so it writes out the prologue, it sees the switch so it writes out the jump tables, it sees each return statement so it writes out the code that returns the values, then it sees the function closing brace and writes out a function epilogue. The problem is that the epilogue is wrong because there is no return statement returning a value, the epilogue is only correct if the function has void return type. Depending on ABI, the function returns to a random address.

Most of the time people accuse compilers of finding and exploiting UB and say they wish it would just emit the straight-forward code, as close to writing out assembly matching the input C code expression by expression as possible. Here you have an example where the compiler never checked for UB let alone proved presence of UB in any sense, it trusted the user, it acted like a high-level assembler, yet this compiler is still not ignoring UB for you? What does it take? Adding runtime checks for the UB case is ignoring? Having the compiler find the UB paths to insert safety code is ignoring?

torstenvl · on Sept 6, 2024

> the epilogue is only correct if the function has void return type

That's a lie.

> Adding runtime checks for the UB case is ignoring? Having the compiler find the UB paths to insert safety code is ignoring?

Don't come onto HN with the intent of engaging in bad faith.

nlewycky · on Sept 8, 2024

> > the epilogue is only correct if the function has void return type

> That's a lie.

All C functions return via a return statement with expression (only for non-void functions), a return statement without an expression (only for void functions) or by the closing of function scope (only for void functions). True?

The simple "spit out a block of assembly for each thing in the C code" compiler spits out the epilogue that works for void-returning functions, because we reach the end of the function with no return statement. That epilog might happen to work for non-void functions too, but unless we specify an ABI and examine that case, it isn't guaranteed to work for them. So it's not correct to emit it. True?

Where's the lie?

> > Adding runtime checks for the UB case is ignoring? Having the compiler find the UB paths to insert safety code is ignoring?

> Don't come onto HN with the intent of engaging in bad faith.

Always! You too!

The text you quoted was referring to how real compilers handle falling off the end of a non-void function today with -fsanitize=return from UBSan. If I understand you correctly, in your reading a compiler with UBSan enabled is non-conforming because it fails to ignore the situation. That's not an argument as to whether your reading is right or wrong, but I do think UBSan compilation ought to be standard conforming, even if that means we need to add it to the Standard.

To the larger point, because the Standard doesn't define what "ignore" means, the user and implementer can't use it to pin down whether a given UB was ignored or not, and thus whether a given program was miscompiled or not. A compiler rewrites the code into its intermediate IR -- could be Z3 SMT solver or raw Turing Machine or anything -- then writes code back out. Can ignoring be done at any stage in the middle? Once the code has been converted and processed, how can you tell from the assembly output what's been ignored and what hasn't? If you demand certain assembly or semantics, isn't that just defining undefined behaviour? If you don't demand them, and leave the interpretation of "ignore" to the particular implementation of a compiler, yet any output could be valid for some potential design of compiler, why not allow any compiler emit whatever it wants?