Near the beginning, the author writes: > Caching is all about maximizing the hit...

NovaX · 2025-02-02T21:07:21 1738530441

As the article mentions, Caffeine's approach is to monitor the workload and adapt to these phase changes. This stress test [1] demonstrates shifting back and forth between LRU and MRU request patterns, and the cache reconfiguring itself to maximize the hit rate. Unfortunately most policies are not adaptive or do it poorly.

Thankfully most workloads are a relatively consistent pattern, so it is an atypical worry. The algorithm designers usually have a target scenario, like cdn or database, so they generally skip reporting the low performing workloads. That may work for a research paper, but when providing a library we cannot know what our users workloads are nor should we expect engineers to invest in selecting the optimal algorithm. Caffeine's adaptivity removes this burden and broaden its applicability, and other language ecosystems have been slowly adopting similar ideas in their caching libraries.

[1] https://github.com/ben-manes/caffeine/wiki/Efficiency#adapti...

hinkley · 2025-02-02T20:49:31 1738529371

I had a team that just did not get my explanations that they had created such a scenario. I had to show them the bus sized “corner case” they had created before they agreed to a more sophisticated cache.

That project was the beginning of the end of my affection for caches. Without very careful discipline that few teams have, once they are added all organic attempts at optimization are greatly complicated. It’s global shared state with all the problems that brings. And if you use it instead of the call stack to pass arguments around (eg passing ID instead of User and making everyone look it up ten times), then your goose really is cooked.

dan-robertson · 2025-02-02T22:01:03 1738533663

Interesting. I hadn’t really thought of global state as being a problem (I mostly think of caches as affecting performance but not semantics but I guess I didn’t really think about cache invalidation/poisoning either). My main worry would be more something like making a cold start very difficult or making things harder to change.

hinkley · 2025-02-02T23:02:18 1738537338

When you design a call tree so that any data used later is passed explicitly down the call tree instead of looked up by ID over and over, then you can be sure that all of the decisions about that data are made on a consistent copy of the data.

When you look up the same value 10 times, you not only pollute the flame graphs and call counts which makes proving that a better algorithm is necessary or has any effect much harder, but more importantly, you could get 3 different states and try to make a bunch of updates based on what should be mutually exclusive states in the system. That's the global shared state problem.

When you look up a value once and remember it throughout a calculation, it may not be the current state, but at least you have a clean snapshot of the data. Which in situations such as cancelling an account immediately after getting one last discount on a purchase, well, we know which scenario the customer probably meant.

zaphirplane · 2025-02-03T11:59:09 1738583949

Tell me more What if other values are looked up deep into the call stack, would that cause actual inconsistency as different values were looked up at different times

hinkley · 2025-02-03T21:34:39 1738618479

It can but it’s very hard to catch. It’s like running your database at the wrong isolation level. By the time the bug happens it’s under heavy load and the system is too noisy to catch the real problem. So you have glitches nobody can explain and they just deal with cleanup.

For this and other reasons I think that in addition to Functional Core, Imperative Shell, you want a “square” call tree in your code. Avoid functions with no fanout, and functions with high fanout. Rearrange code that uses the same data to happen as close together as you can, to improve local reasoning. When functions get unwieldy, or deleted code makes them too small, use the b-tree algorithm as inspiration to rebalance the tree.

Refactor when new features change the coupling of the code.

t0mas88 · 2025-02-02T20:50:55 1738529455

These are exactly the things to worry about in an application that has enough scale for it. My usual approach is to have a wiki page or document to describe these limitations and roughly the order of magnitude where you will encounter them. Then do nothing and let them be until that scale is on the horizon.

There is no point fixing a "this could be slow if we have more than 65535 users" if you currently have 100 users.

I usually add a few pointers to the document on how to increase the scaling limit a bit without major rebuilding (e.g. make this cache size 2x larger). Those are useful as a short term solution during the time needed to build the real next version.

ratorx · 2025-02-02T20:54:10 1738529650

Caching itself is introducing a discontinuity, because whether a request does or does not hit the cache will have vastly different performance profiles (and if not, then the cache may be a bit useless).

I think the only way to approach this problem is statistically, but average is a bad metric. I think you’d care about some high percentile instead.