Thinking about the high cost of training large AI models, and how much duplication in computation there must between such models when trained by company A vs company B (say, training a large language model on the whole internet), I've been wondering if it makes sense for humanity to create a central, global database that stores every computation performed by any connected computer (utilizing either CPU or GPU). This would record the inputs of these computations at a low level (in machine code, for instance) and their outputs (like the result of a summation operation). It would be continually updated with new computations over time.
The primary function of this database would be to act as a sort of "global cache." When a member computer is about to perform a computation, it would first check this database. If the computation has already been done, the computer would simply fetch the pre-computed result instead of redoing the computation. The underlying goal is to save on compute resources globally.
N.B. this does not mean we precompute anything necessarily, but we do store everything we have computed thus far. The hit rate on the cache might be very low for a while, but one would think it'd eventually go up. The way we're going about this (throwing more GPUs at it) just seems awfully wasteful to me.
Has anyone thought about/done any research on this?
This idea glosses over the engineering complexity of “searching” the cache, which sounds like it will grow to include every possible computation ever.
The reason it’s not feasible is the same reason computers can’t just have an huge L1 cache, instead of a hard drive. There’s physical limitations of materials when retrieving and searching the cache. So just performing the computation is often quicker.
However… your suggestion would be suitable for functional programming. Pure functions should always return the same result, so caching the result of CPU intensive functions makes a lot of sense… which is what [1] bazel’s remote cache does. But most software does not use pure functions…
Also, there is another interesting question that comes to mind. What if “quantum computing” could allow us to do “branch prediction” of computation at an incredible scale?
[1] https://bazel.build/remote/caching