If all of the latencies to the main memory and L1/L2/L3 caches have been only in...

xxs · 2025-02-11T16:14:35 1739290475

>If all of the latencies to the main memory and L1/L2/L3 caches have been only increasing throughout the years

That's actually not the case at all both L1/L2 (to a lesser extent L3) have been improved dramatically in the past years. L1/L2 are effectively running at a very similar clock to the CPU, e.g. L1 nowadays could be 2 CPU cycles. About the L3, it does depend on the architecture and where the cache coherency occurs - yet generally the latency has not increased.

The main memory (DRAM) latency has been more or less similar, though.

>is picking the right data memory layout more or less important than before?

It has always been important, it's not about the latencies per se but the locality, which has been the case for more than two decades. The general rules are:: 1st, if you data set is small, e.g. N is tiny - it's effectively O(1) (as N is a constant), 2nd) pick bigO that works best [usually O(1) or O(logN)], 3rd) big the datastrtucture with the best locality.

menaerus · 2025-02-11T17:12:47 1739293967

> That's actually not the case

I don't know where do you get the data but the latencies are surely not decreasing and 2 cycles for L1 is totally wrong. I should have perhaps been more specific that under latency I actually meant latency in terms of CPU cycles not the actual number (ns) which obviously depends on the CPU frequency.

Examining the few past generations of both Intel and AMD microarchitectures this is what I concluded. L1 is between 4 and 6 cycles. L2 on Intel is between 12 and 16 cycles whereas AMD is roughly around that but 1-2 cycles faster. L3 on AMD is ~40 cycles where L3 on Intel goes anywhere between 60 and even up to 120 cycles.

It is an engineering feat to increase the capacity of the cache without sacrificing a little bit of latency. It happens that the latency increase is sometimes less visible due to some other effects but generally speaking this is what happens.

> The main memory (DRAM) latency has been more or less similar, though.

~70ns of DRAM latency in a CPU core running @3Ghz is different than the core running at @5Ghz. Former is ~200 cycles while the latter is ~400 cycles. So, the higher the core clock is, higher the latency is in terms of CPU cycles stalled.

> It has always been important

Yes, I am aware of that and I am only trying to make a case for the parent comment since it was asking for a clarification why would it be more important today than it had been (at all) in the past.

> it's not about the latencies per se but the locality

I think it's vice versa because the actual issue we are trying to mitigate is the high DRAM latency. As a consequence it just happens that we use data locality to minimize those effects.

naasking · 2025-02-13T15:51:56 1739461916

The answer is obvious, but I don't really consider that evidence of systems programming and complexity analysis diverging. When approaching any given problem, you would still start with a basic complexity analysis to get in the right order of magnitude, and then select among cache-friendly approaches that fall in that range. You almost never go the other way around.