Personally, I think that arguing that those who define and implement a standard ...

bonzini · on May 20, 2021

Different code depends on different optimizations. A loop on an int** might benefit a lot from aliasing optimizations, because the compiler will assume that a[i] will remain the same after writing to a[i][j]. Other code may not benefit at all.

Likewise that loop may not benefit from signed overflow; instead an initialization loop that, by way of multiple level of macros, ends up doing a[i]=b[i]*1000/100, might become twice as fast if signed overflow rules let the compiler rewrite the assignment as a[i]=b[i]*10.

vyodaiken · on May 21, 2021

Wang et al tried this an experiment and found no serious wins.

aw1621107 · on May 21, 2021

For reference, [0] appears to be the referenced paper. The relevant passage:

> To understand how disabling these optimizations may impact performance, we ran SPECint 2006 with GCC and Clang, respectively, and measured the slowdown when compiling the programs with all the three -fno-* [-fno-strict-overflow, -fno-delete-null-pointer-checks, and -fno-strict-aliasing] options shown in Figure 9. The experiments were conducted on a 64-bit Ubuntu Linux machine with an Intel Core i7-980 3.3 GHz CPU and 24 GB of memory. We noticed slowdown for 2 out of the 12 programs, as detailed next.

> 456.hmmer slows down 7.2% with GCC and 9.0% with Clang. The first reason is that the code uses an int array index, which is 32 bits on x86-64, as shown below.

    int k;
    int *ic, *is;
    ...
    for (k = 1; k <= M; k++) {
        ...
        ic[k] += is[k];
        ...
    }

> As allowed by the C standard, the compiler assumes that the signed addition k++ cannot overflow, and rewrites the loop us- ing a 64-bit loop variable. Without the optimization, however, the compiler has to keep k as 32 bits and generate extra in- structions to sign-extend the index k to 64 bits for array access. This is also observed by LLVM developers [14].

> Surprisingly, by running OProfile we found that the most time-consuming instruction was not the sign extension but loading the array base address is[] from the stack in each iteration. We suspect that the reason is that the generated code consumes one more register for loop variables (i.e., both 32 and 64 bits) due to sign extension, and thus spills is[] on the stack.

> If we change the type of k to size_t, then we no longer observe any slowdown with the workaround options. 462.libquantum slows down 6.3% with GCC and 11.8% with Clang. The core loop is shown below.

    quantum_reg *reg;
    ...
    // reg->size: int
    // reg->node[i].state: unsigned long long for (i = 0; i < reg->size; i++)
    reg->node[i].state = ...;

> With strict aliasing, the compiler is able to conclude that updating reg->node[i].state does not change reg->size, since they have different types, and thus moves the load of reg->size out of the loop. Without the optimization, however, the compiler has to generate code that reloads reg->size in each iteration. If we add a variable to hold reg->size before entering the loop, then we no longer observe any slowdown with the workaround options.

> While we observed only moderate performance degradation on two SPECint programs with these workaround options, some previous reports suggest that using them would lead to a nearly 50% drop [6], and that re-enabling strict aliasing would bring a noticeable speed-up [24].

[0]: https://pdos.csail.mit.edu/papers/ub:apsys12.pdf

[6]: https://lists.gnu.org/archive/html/autoconf-patches/2006-12/...

[24]: (dead link, doesn't appear to be available on the Wayback Machine) https://www.linaro.org/blog/compiler-flags-used-to-speed-up-...

Gibbon1 · on May 22, 2021

> If we change the type of k to size_t, then we no longer observe any slowdown with the workaround options

Basically in that case the benefits of the optimization disappears once you fix the code.

aw1621107 · on May 23, 2021

I mean, that's kind of a tautological statement; if you change the code to either eliminate the need for the optimization or manually implement it, of course the benefits of the optimization disappear. That would apply to most, if not all, optimizations.