It's still a trade off, then all your pointer arithmetic has to be adjusted inst...

dougall · on Aug 12, 2023

It is a trade off, but a lot of processors have free offsets from loads, so pointer chasing is almost always free.

On ARM, "ldr x0, [x1]" just becomes "ldur x0, [x1, #-1]" - same size, same performance (at least on the Apple M1). If you have an array in a structure "add x0, x1, #16 ; ldr x0, [x0, x2, lsl #3]" becomes "add x0, x1, #15 ; ldr x0, [x0, x2, lsl #3]".

The only place I can think of penalties are pointers directly to arrays: "ldr x0, [x0, x1, lsl #3]" becomes "sub x0, x0, #1 ; ldr x0, [x0, x1, lsl #3]". Or loads from large offsets - ldur can only reach 256 bytes, whereas the aligned ldr can reach 32760 bytes. In either case the penalty is only one SUB operation.

rollcat · on Aug 12, 2023

By the way, at least for pointer arithmetic, this makes a lot of sense: adding two valid addresses never makes sense, so ptr+i always results in a valid pointer (assuming i is a multiple of sizeof(ptr)).

anttihaapala · on Aug 12, 2023

With pointers having lowest bit set, you'd check if the LSB is set and then do for example (uint32_t *)((char *)p - 1) + 2 which the compiler would optimize to register + 7 instead of register + 8, i.e. the penalty would be exactly zero for any other offsets beside 0.

Joker_vD · on Aug 12, 2023

Honestly, it's a shame that the appropriate lower bits aren't automatically masked off by the CPU before the atual load/store.

gpderetta · on Aug 12, 2023

That's by design. Implicit masking used to be the norm in older architectures, but programs would store data there with impunity and it would make it harder to extend the address space, so AMD64 requires most significant bits of an address to be either all 1 or all 0.

Joker_vD · on Aug 12, 2023

Yeah, but I'm talking about lower bits, not the upper bits.

Off-tangent: virtual pointers on AMD64 are effectively signed numbers and yet most low-level intro-level programming books keep saying that "pointers are unsigned ints because it doesn't make sense for pointers to be negative, bla-bla". Hell, I believe xv6 book has this passage and the passage about pointers' upper-bits on AMD64 being either all 1 or 0 on the same page!

gpderetta · on Aug 12, 2023

Ah, ok, but now you can't do subword aligned read/writes. That was a costly mistake that Alpha originally made and most architectures have avoided since.

Joker_vD · on Aug 12, 2023

That depends on ISA's details, honestly. For example, Knuth's MMIX has 64-bit words and has separate intructions to load/store 8-, 16-, 32-, or 64-bit wide chunks of data to/from memory — which are forced to be naturally aligned because 0, 1, 2, or 3 lower bits of the address are masked, depending on the data access's width. So there you go, you have aligned access for all "natural" subword sizes.

sweetjuly · on Aug 12, 2023

Don't most contemporary high performance processors support unaligned memory operations? An eight byte load with non-zero vaddr[2:0] from an all zero one