Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Personally, I think that arguing that those who define and implement a standard don’t understand one of the most fundamental aspects of said standard is going to be an uphill battle.

You could argue that they’ve lost their way, and the article flirts with this, but the path forward is the hard part, and IMHO rings a bit hollow: it’s asserted that these rules aren’t needed for performance, but no evidence is given, and what similar evidence we do have (compiling on lower optimization levels) doesn’t seem to support this thesis. You could argue that, the kernel, which turns off aliasing, is plenty performant without it, and that’s a decent argument, but it’s not clear that it wouldn’t be even faster with it, and it’s much harder to empirically test this than removing the flag, since it will miscompile.



Different code depends on different optimizations. A loop on an int** might benefit a lot from aliasing optimizations, because the compiler will assume that a[i] will remain the same after writing to a[i][j]. Other code may not benefit at all.

Likewise that loop may not benefit from signed overflow; instead an initialization loop that, by way of multiple level of macros, ends up doing a[i]=b[i]*1000/100, might become twice as fast if signed overflow rules let the compiler rewrite the assignment as a[i]=b[i]*10.


Wang et al tried this an experiment and found no serious wins.


For reference, [0] appears to be the referenced paper. The relevant passage:

> To understand how disabling these optimizations may impact performance, we ran SPECint 2006 with GCC and Clang, respectively, and measured the slowdown when compiling the programs with all the three -fno-* [-fno-strict-overflow, -fno-delete-null-pointer-checks, and -fno-strict-aliasing] options shown in Figure 9. The experiments were conducted on a 64-bit Ubuntu Linux machine with an Intel Core i7-980 3.3 GHz CPU and 24 GB of memory. We noticed slowdown for 2 out of the 12 programs, as detailed next.

> 456.hmmer slows down 7.2% with GCC and 9.0% with Clang. The first reason is that the code uses an int array index, which is 32 bits on x86-64, as shown below.

    int k;
    int *ic, *is;
    ...
    for (k = 1; k <= M; k++) {
        ...
        ic[k] += is[k];
        ...
    }
> As allowed by the C standard, the compiler assumes that the signed addition k++ cannot overflow, and rewrites the loop us- ing a 64-bit loop variable. Without the optimization, however, the compiler has to keep k as 32 bits and generate extra in- structions to sign-extend the index k to 64 bits for array access. This is also observed by LLVM developers [14].

> Surprisingly, by running OProfile we found that the most time-consuming instruction was not the sign extension but loading the array base address is[] from the stack in each iteration. We suspect that the reason is that the generated code consumes one more register for loop variables (i.e., both 32 and 64 bits) due to sign extension, and thus spills is[] on the stack.

> If we change the type of k to size_t, then we no longer observe any slowdown with the workaround options. 462.libquantum slows down 6.3% with GCC and 11.8% with Clang. The core loop is shown below.

    quantum_reg *reg;
    ...
    // reg->size: int
    // reg->node[i].state: unsigned long long for (i = 0; i < reg->size; i++)
    reg->node[i].state = ...;
> With strict aliasing, the compiler is able to conclude that updating reg->node[i].state does not change reg->size, since they have different types, and thus moves the load of reg->size out of the loop. Without the optimization, however, the compiler has to generate code that reloads reg->size in each iteration. If we add a variable to hold reg->size before entering the loop, then we no longer observe any slowdown with the workaround options.

> While we observed only moderate performance degradation on two SPECint programs with these workaround options, some previous reports suggest that using them would lead to a nearly 50% drop [6], and that re-enabling strict aliasing would bring a noticeable speed-up [24].

[0]: https://pdos.csail.mit.edu/papers/ub:apsys12.pdf

[6]: https://lists.gnu.org/archive/html/autoconf-patches/2006-12/...

[24]: (dead link, doesn't appear to be available on the Wayback Machine) https://www.linaro.org/blog/compiler-flags-used-to-speed-up-...


> If we change the type of k to size_t, then we no longer observe any slowdown with the workaround options

Basically in that case the benefits of the optimization disappears once you fix the code.


I mean, that's kind of a tautological statement; if you change the code to either eliminate the need for the optimization or manually implement it, of course the benefits of the optimization disappear. That would apply to most, if not all, optimizations.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: