When I saw how large that chip was, I immediately thought of cooling such a beast.
Can any materials scientists or engineers comment on if other elements will withstand higher heat better than silicon? Seems like such a large chip would be somewhat better to run at higher temperature rather than budget for huge and elaborate cooling. (This is very much a layman's question. The people who designed the chip and its cooling are far, far smarter than me!)
GaN-on-SiC and native SiC both support far higher temperatures, with native SiC lasting thousands of hours even at 500C Tj [0] and commercial GaN-on-SiC being rated for e.g. 225C Tjmax [1].
At such large die scales and high temperatures heat engines become practical as a means of both cooling and also extracting work. I wonder if there is serious research in this area?
IIRC they become less efficient at higher temperatures, as the thermal-induced noise increases (might be as part of sub-threshold leakage; I don't remember the details).
There are considerable efficiency gains from running silicon CMOS at LN2 temperatures instead of room temperature, but the benefits fall apart once you realize you'll have to heat-pump the electrical consumption from 77K to room temperature. Main benefits would be being able to run them faster, and a good part of the optimization would need lower dopant concentrations in the transistor channels to properly take advantage of the low temperature, which unfortunately rules out common shared-wafer prototyping runs (so testing this IRL isn't really accessible).
Based on a high-school level understanding, the cooling requirements would be just be proportional to the surface area, nothing special. Maybe there's an added risk of physical fissures developing, but that's hard to know a priori.
The article gives a figure of 15 kW for the chip. That’s the kind of heat usually generated by a small room full of servers. Radiating that away on the outside is not the main issue, solutions exist for that. But getting that kind of heat away from the chip and into the water in the first place has to be a nontrivial challenge.
The chip is approximately 21 cm x 21 cm (that's 8.5 inches).
My kettle has 2 kW and doesn't take long to boil water from room temperature. I reckon you could fit four such kettles on the chip area (roughly). That means the chip would roughly boil water two times as fast as my kettle, were it used for that purpose.
While that does pose reasonably interesting engineering challenges regarding coolant throughput etc., I don't think there's anything particularly difficult there. You probably would want a better heat transfer medium between chip and water than my kettle has (well, I have not disassembled it), but I agree with GP that a bunch of water pipes will work well enough as a cooling solution.
Edit: Actually, screw it, we can calculate how much water we need to put through there. Warming water from 20 to 100 °C takes 334 kJ/kg. (That comes out to 167 seconds to heat 1l of water in a 2 kW kettle, for reference.) To remove 15 kW of heat with water cooling, assuming the water goes in at 20 and comes out at 100 °C, we need a throughput of 0.045 kg/s = 45 g/s = 45 ml/s.
Sure, the temperature range may be a bit optimistic, but 45 ml/s (one liter every 22 seconds) is literally "just hold it under a running tap". The main engineering challenge would be making sure that heat is removed evenly enough, I guess.
As the previous comment pointed out, its not the net energy that needs to be removed that is the problem.
I.e. its like a kettle that isn't just radiating the stove coil's energy away, it is actually trying to keep your stove coil cool while its turned up to 10!
The same amount of energy movement, but not the same problem at all.
Now the question is what percent of that energy is used by the chip. I understand it is a nice kettle but will be nice to see what can do in terms of computation.
The traditional computer included in the box is probably quite high end and power hungry too so that it can provide enough data to maintain those bandwidths. They don't appear to sell the chips by themselves.
I think the comparison is with an equivalent gpu cluster like the nvidia DGX systems or HPC CPU nodes. The DGX A100 is 6.5kW for example,
The Cerebras system fits 15 rack units which is more than 2x larger than the DGX (6.5U). A similar 15 node HPC server with CPU is probably not that far from 15kW either (2 socket per node, 250W per CPU is already 7.5kW, then add RAM etc.) so by HPC standards it's less "full room of servers" than "single cabinet".
If the thermal interfaces are done correctly then you can put an absurd flow rate of coolant across the fins and bring it out to a huge heat exchange system.
The trick isn't total power, it's power density on the die. In that regard, I don't think this is pushing the boundaries. It just needs custom built interfaces.
The article talks about the need for special allows that have minimal expansion. There is 15 kilowatts of energy to dissipate in a very small space and the chip really cant expand or contract differently from the cooling block This seems like a hard problem to solve.
Can any materials scientists or engineers comment on if other elements will withstand higher heat better than silicon? Seems like such a large chip would be somewhat better to run at higher temperature rather than budget for huge and elaborate cooling. (This is very much a layman's question. The people who designed the chip and its cooling are far, far smarter than me!)