A thing I worry about a lot is discontinuities in cache behaviour (simple example: let’s say a client polls a list of entries, and downloads each entry from the list one at a time to see if it is different. Obviously this feels like a bit of a silly way for a client to behave. If you have a small lru cache (eg maybe it is partitioned such that partitions are small and all the requests from this client go to the same partition) then there is some threshold size where the client transitions from ~all requests hitting the cache to ~none hitting the cache.)
This is a bit different from some behaviours always being bad for cache (eg a search crawler fetches lots of entries once).
Am I wrong to worry about these kinds of ‘phase transitions’? Should the focus just be on optimising hit rate in the average case?
As the article mentions, Caffeine's approach is to monitor the workload and adapt to these phase changes. This stress test [1] demonstrates shifting back and forth between LRU and MRU request patterns, and the cache reconfiguring itself to maximize the hit rate. Unfortunately most policies are not adaptive or do it poorly.
Thankfully most workloads are a relatively consistent pattern, so it is an atypical worry. The algorithm designers usually have a target scenario, like cdn or database, so they generally skip reporting the low performing workloads. That may work for a research paper, but when providing a library we cannot know what our users workloads are nor should we expect engineers to invest in selecting the optimal algorithm. Caffeine's adaptivity removes this burden and broaden its applicability, and other language ecosystems have been slowly adopting similar ideas in their caching libraries.
I had a team that just did not get my explanations that they had created such a scenario. I had to show them the bus sized “corner case” they had created before they agreed to a more sophisticated cache.
That project was the beginning of the end of my affection for caches. Without very careful discipline that few teams have, once they are added all organic attempts at optimization are greatly complicated. It’s global shared state with all the problems that brings. And if you use it instead of the call stack to pass arguments around (eg passing ID instead of User and making everyone look it up ten times), then your goose really is cooked.
Interesting. I hadn’t really thought of global state as being a problem (I mostly think of caches as affecting performance but not semantics but I guess I didn’t really think about cache invalidation/poisoning either). My main worry would be more something like making a cold start very difficult or making things harder to change.
When you design a call tree so that any data used later is passed explicitly down the call tree instead of looked up by ID over and over, then you can be sure that all of the decisions about that data are made on a consistent copy of the data.
When you look up the same value 10 times, you not only pollute the flame graphs and call counts which makes proving that a better algorithm is necessary or has any effect much harder, but more importantly, you could get 3 different states and try to make a bunch of updates based on what should be mutually exclusive states in the system. That's the global shared state problem.
When you look up a value once and remember it throughout a calculation, it may not be the current state, but at least you have a clean snapshot of the data. Which in situations such as cancelling an account immediately after getting one last discount on a purchase, well, we know which scenario the customer probably meant.
Tell me more
What if other values are looked up deep into the call stack, would that cause actual inconsistency as different values were looked up at different times
It can but it’s very hard to catch. It’s like running your database at the wrong isolation level. By the time the bug happens it’s under heavy load and the system is too noisy to catch the real problem. So you have glitches nobody can explain and they just deal with cleanup.
For this and other reasons I think that in addition to Functional Core, Imperative Shell, you want a “square” call tree in your code. Avoid functions with no fanout, and functions with high fanout. Rearrange code that uses the same data to happen as close together as you can, to improve local reasoning. When functions get unwieldy, or deleted code makes them too small, use the b-tree algorithm as inspiration to rebalance the tree.
Refactor when new features change the coupling of the code.
These are exactly the things to worry about in an application that has enough scale for it. My usual approach is to have a wiki page or document to describe these limitations and roughly the order of magnitude where you will encounter them. Then do nothing and let them be until that scale is on the horizon.
There is no point fixing a "this could be slow if we have more than 65535 users" if you currently have 100 users.
I usually add a few pointers to the document on how to increase the scaling limit a bit without major rebuilding (e.g. make this cache size 2x larger). Those are useful as a short term solution during the time needed to build the real next version.
Caching itself is introducing a discontinuity, because whether a request does or does not hit the cache will have vastly different performance profiles (and if not, then the cache may be a bit useless).
I think the only way to approach this problem is statistically, but average is a bad metric. I think you’d care about some high percentile instead.
> Caching is all about maximizing the hit ratio
A thing I worry about a lot is discontinuities in cache behaviour (simple example: let’s say a client polls a list of entries, and downloads each entry from the list one at a time to see if it is different. Obviously this feels like a bit of a silly way for a client to behave. If you have a small lru cache (eg maybe it is partitioned such that partitions are small and all the requests from this client go to the same partition) then there is some threshold size where the client transitions from ~all requests hitting the cache to ~none hitting the cache.)
This is a bit different from some behaviours always being bad for cache (eg a search crawler fetches lots of entries once).
Am I wrong to worry about these kinds of ‘phase transitions’? Should the focus just be on optimising hit rate in the average case?