Input, Output Unionnions. basically a input struct into a algorith, layed out in such a way, that the algorithms output, always only overwrites input no longer needed. If done well, this allows for hot-loops that basically go over one array for read & write-backs. After all, its all just memory and to use what you got in situ is the fastet way one can go.
No pointers to dereference and wait, just the input, computated, stored back into the same cacheline- to be used for the next part of the hot loop.
If the hot-loop is multi staged and the output continously decreases in size, one can even "compress" the data, by packing an array of orgMaxisze/outputSize subunits into the unit.
This is unsafe C Code by nature, but test driven development on platform can make it "safe".
Now you start with a struct in the union:
{inputA, inputB, inputC}
and pad the intermediate struct
{Padding against Overlap, resultB, result C}
and end up with the result struct in the same union.
{outputA, outputB, outputC} .
The trick is to keep track of the state and validate the "purity" via automated tests.
Then you have it all in one L1 Cache Line, pumping through a algo, no dereferences, no huge stack structures, its all there, as long sas the size of input output does not differ wildly.
Remember down there its all just bytes accessed by code. There is no such concept as objects or even variables.
All those mental pots to grab things out and put things back in, are artificial constructs needed by us.
The machine down there, can go to work like a line cooking shiva in a trailer. And doing so and having this in view, makes it faster.
PS: Pointers within the struct nullifys the advantages gained here, because every pointer is a memory load and thus "relatively" slow.
No pointers to dereference and wait, just the input, computated, stored back into the same cacheline- to be used for the next part of the hot loop. If the hot-loop is multi staged and the output continously decreases in size, one can even "compress" the data, by packing an array of orgMaxisze/outputSize subunits into the unit.