Well, it could be... not SRAM? It's not the only kind of RAM, and the choice to use SRAM is certainly not an obvious one. It could make sense as part of a specific paradigm, but that is not explained, and hence why I am asking. It may be perfectly obvious to you, but it's not to me.
You basically have the option between SRAM, HBM (DRAM), and something new. You can imagine the risks with using new memory tech on a chip like this.
The issue with HBM is that it's much slower, much more power hungry (per access, not per byte), and not local (so there are routing problems). You can't scale that to this much compute.
But HBAM and other RAMs are, presumably, vastly cheaper otherwise. (You can keep explaining that, but unless you work for Cerebras and haven't thought to mention that, talking about how SRAM is faster is not actually an answer to my question about what paradigm is intended by Cerebras.)
They say they support efficient execution of smaller batches. They cover this somewhat in their HotChips talk, eg. “One instance of NN, don't have to increase batch size to get cluster scale perf” from the AnandTech coverage.
If this doesn't answer your question, I'm stuck as to what you're asking about. They use SRAM because it's the only tried and true option that works. Lots of SRAM means efficient execution of small batch sizes. If your problem fits, good, this chip works for you, and probably easily outperforms a cluster of 50 GPUs. If your problem doesn't, presumably you should just use something else.