Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Question: why is a union memory unsafe?

My meager understanding of unions is that they allow data of different types to be overlayed in the same area of memory, with the typical use case being for data structures that may contain different types of data (and the union typically being embedded in a struct that identifies the data type). This certainly presents problems with the interpretation of data stored in the union, but it also strikes me that the union object would have a clearly defined sized and the compiler would be able to flag any memory accesses outside of the bounds of the union. While this is clearly problematic, especially if at least one of the elements is a pointer, it also seems like the sort of problem that a compiler can catch (which is the benefit of Rust on this front).

Please correct me if I'm wrong. This sort of software development is a hobby for me (anything that I do for work is done in languages like Python).



A trivial example of this would be a tagged union that represents variants with control structures of different sizes; if the attacker can induce a confusion between the tag and the union member at runtime, they can (typically) perform a controlled read of memory outside of the intended range.

Rust avoids this by having sum types, as well as preventing the user from constructing a tag that’s inconsistent with the union member. So it’s not that a union is inherent unsafe, but that the language’s design needs to control the construction and invariants of a union.


Canonical example:

    union {
        char* p;
        long i;
    };
Then say that the attacker can write arbitrary integers into `i` and then trigger dereferences on `p`.


The standard does not assign meaning to this sequence of execution, so an implementation can detect this and abort. This is not just hypothetical: existing implementations with pointer capabilities (Fil-C, CHERI targets, possibly even compilers for IBM i) already do this. Of course, such C implementations are not widely used.

The union example is not particularly problematic in this regard. Much more challenging is pointer arithmetic through uintptr_t because it's quite common. It's probably still solvable, but at a certain point, changes the sources becomes easier, even at at scale (say if something uses the %p format specifier with sprintf/sscanf).


> The standard does not assign meaning to this sequence of execution, so an implementation can detect this and abort.

Real C programs use these kinds of unions and real C compilers ascribe bitcast semantics to this union. LLVM has a lot of heavy machinery to make sure that the programmer gets exactly what then expected here.

The spec is brain damage. You should ignore it if you want to be able to reason about C.

> This is not just hypothetical: existing implementations with pointer capabilities (Fil-C, CHERI targets, possibly even compilers for IBM i) already do this

Fil-C does not abort when you use this union. You get memory safe semantics:

- you can use `i` to change the pointer’s intval. But the capability can’t be changed that way. So if you make a mistake you’ll end up with an OOB pointer.

- you can use `i` to read the pointer’s current intval just as if you had done an ptrtoint cast.

I think CHERI also does not abort on the union itself. I think storing to `i` removes the capability bit so `p` crashes on deref.

> The union example is not particularly problematic in this regard. Much more challenging is pointer arithmetic through uintptr_t because it's quite common.

The union problem is one of the reasons why C is not memory safe, because C compilers give unions the expected structured assembly semantics, not whatever nonsense is in the spec.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: