Ask HN: Is building a global cache of all computations feasible?

gitgud · on Nov 14, 2023

> “When a member computer is about to perform a computation, it would first check this database. If the computation has already been done, the computer would simply fetch the pre-computed result instead of redoing the computation. The underlying goal is to save on compute resources globally.”

This idea glosses over the engineering complexity of “searching” the cache, which sounds like it will grow to include every possible computation ever.

The reason it’s not feasible is the same reason computers can’t just have an huge L1 cache, instead of a hard drive. There’s physical limitations of materials when retrieving and searching the cache. So just performing the computation is often quicker.

However… your suggestion would be suitable for functional programming. Pure functions should always return the same result, so caching the result of CPU intensive functions makes a lot of sense… which is what [1] bazel’s remote cache does. But most software does not use pure functions…

Also, there is another interesting question that comes to mind. What if “quantum computing” could allow us to do “branch prediction” of computation at an incredible scale?

[1] https://bazel.build/remote/caching

latexr · on Nov 13, 2023

That sounds neither possible nor desirable. Just think of all the energy and time proof of work cryptocurrencies need. You want to expand that to unthinkable levels. Let’s say you have a list of fifty names, first and last, and want to get only the last names. You compute one result and store it. If you change a single letter, the first computation is of no use and you need to redo it all and waste more space. Most operations would be slower, even, because now you need to wait for a network request for every little thing, wasting resources looking in a database, and the overwhelming majority will return nothing. And think of how many powerful computers you’d need to run such a service all the time for everyone.

smoldesu · on Nov 13, 2023

No; the computational cost of storing and navigating hashes will eventually exceed the cost of just running it locally. Even assuming that most of the work is done server side, the power (and latency) cost of making a POST request would probably exceed most basic calculations. My intuition says this would fail on a lot of levels.

One immediate problem is that you have to map the complexity space of an infinite number of inputs and outputs to even meaningfully store a signature of the computation. The complexity of the input is probably larger than the output, in most cases. That makes "searching" for a solution almost pointless out of the gate.

quickthrower2 · on Nov 14, 2023

People are saying no, but a lite version of this is already done. It is called foundational models. A foundational model such as Llama2 is then fine tuned at a much lower cost by various people either for research or production. Caches and checkpoints will always be useful. But you are not saving every computation.