Note that writing 64 bits and reading 32 (or viceversa) is not a way to get arou...

moonchild · on May 12, 2024

It's not documented as being undefined; it's simply not documented at all.

Intel's latest uarch does partial store forwarding.

There was a paper from a few years ago trying to define semantics for mixed-size accesses (not for x86 though) https://www.cl.cam.ac.uk/~pes20/popl17/mixed-size.pdf

I don't think the parent was talking about this, though; they were just talking about using a single large physical location, which logically contains multiple smaller values. Accesses to a single location happen order, so there is indeed no need for fencing between accesses to it. Usually you get a full 128 bits (at least amd64/aarch64/ppc64; not riscv yet but I expect they will get there).

That said—mixed-size can be useful despite the lack of semantics (I think linux uses them in a few places?). sooo

gpderetta · on May 12, 2024

Ah, right, it was about guaranteed total order on all stores in a single memory location.

Re colocation and x86, IIRC the intel memory model has wordings regarding read and writes to a a memory location having to be of the same size to take advantage of the memory model guarantees.

moonchild · on May 12, 2024

total order on all accesses to a given location—loads from a single location can't be reordered w.r.t. each other either

i don't remember seeing any wording relating to mixed-size accesses in the intel manual (not withstanding that the official models are ... ambiguous, to say the least, compared with what 3rd-party researchers have done)

gpderetta · on May 12, 2024

> i don't remember seeing any wording relating to mixed-size accesses in the intel manual (not withstanding that the official models are ... ambiguous, to say the least, compared with what 3rd-party researchers have done)

I was probably misremembering the details. The manual has to say this regarding #LOCK prefixed operations:

"Software should access semaphores (shared memory used for signalling between multiple processors) using identical addresses and operand lengths. For example, if one processor accesses a semaphore using a word access, other processors should not access the semaphore using a byte access"

which is already vague enough, but regarding general atomic load and stores I couldn't find anything.

gpderetta · on May 12, 2024

For a total store order to be meaningful of course it implies that loads are also non visibly reordered. If a store falls in the forest but nobody is around to load it, was it really ordered :)

moonchild · on May 13, 2024

Tbf you could say stores happen in order, and loads can happen out of order unless you fence. Personally I don't understand why we need such strong ordering constraints for weakly ordered reads—istm you can go much weaker and maintain sanity.