Note that writing 64 bits and reading 32 (or viceversa) is not a way to get around fences on x86. It is explicitly documented as begin undefined. In most cases it will fail to store-forward that will stall and act as an implicit fence, but in some cases the CPU can do partial store forwarding, breaking it.
I don't think the parent was talking about this, though; they were just talking about using a single large physical location, which logically contains multiple smaller values. Accesses to a single location happen order, so there is indeed no need for fencing between accesses to it. Usually you get a full 128 bits (at least amd64/aarch64/ppc64; not riscv yet but I expect they will get there).
That said—mixed-size can be useful despite the lack of semantics (I think linux uses them in a few places?). sooo
Ah, right, it was about guaranteed total order on all stores in a single memory location.
Re colocation and x86, IIRC the intel memory model has wordings regarding read and writes to a a memory location having to be of the same size to take advantage of the memory model guarantees.
total order on all accesses to a given location—loads from a single location can't be reordered w.r.t. each other either
i don't remember seeing any wording relating to mixed-size accesses in the intel manual (not withstanding that the official models are ... ambiguous, to say the least, compared with what 3rd-party researchers have done)
> i don't remember seeing any wording relating to mixed-size accesses in the intel manual (not withstanding that the official models are ... ambiguous, to say the least, compared with what 3rd-party researchers have done)
I was probably misremembering the details. The manual has to say this regarding #LOCK prefixed operations:
"Software should access semaphores (shared memory used for signalling between multiple processors) using identical addresses and operand lengths. For example, if one processor accesses a semaphore using a word access, other processors should not access the semaphore using a byte access"
which is already vague enough, but regarding general atomic load and stores I couldn't find anything.
For a total store order to be meaningful of course it implies that loads are also non visibly reordered. If a store falls in the forest but nobody is around to load it, was it really ordered :)
Tbf you could say stores happen in order, and loads can happen out of order unless you fence. Personally I don't understand why we need such strong ordering constraints for weakly ordered reads—istm you can go much weaker and maintain sanity.
AFAIK this trick does work on SPARC though.