The World’s Largest Computer Chip

lproven · on Aug 22, 2021

This is quite an interesting article, from an unexpected source for a tech piece, but I am really disappointed that it never mentioned probably the most famous previous effort at wafer-scale integration:

https://en.wikipedia.org/wiki/Wafer-scale_integration

... as backed by Sir Clive Sinclair: Ivor Catt's Anamartic Ltd.

http://www.ivorcatt.co.uk/x5as.htm

https://en.wikipedia.org/wiki/Ivor_Catt

http://www.computinghistory.org.uk/det/8199/Anamartic-Limite...

willis936 · on Aug 21, 2021

You can see a photo of one of the chips and get some more technical details here:

https://youtu.be/FNd94_XaVlY

Frost1x · on Aug 21, 2021

I find it odd that an article written about scale, and not just scale but the biggest scale, didn't include a photo that demonstrates... relative sizes.

The official page demonstrates relative size at quick glance (I guess they do use fingernails and dinnerplates but eh): https://cerebras.net/chip/

IAmEveryone · on Aug 21, 2021

But their „product“ link in the footer goes to https://cerebras.net/?page_id=632, which is 404. So I‘ll wait before judging the relative skills of each site‘s technical abilities.

baybal2 · on Aug 21, 2021

Inaccuracy there. Cerebas is by far not the first trillion transistor chip.

The first 1 trillion transistor chip was Samsung's 3D NAND chip, and it went with rather little fanfare.

P.S. 2 — Google is by far not the first company to do "automatic floorplanning." This is what literally every EDA does.

high_derivative · on Aug 21, 2021

I am not really buying the lack of participation in MLPerf. Just give us the numbers, don't skirt around about 'not being made for these benchmarks'.

unnouinceput · on Aug 21, 2021

You missed the point. They won't do it because the second nVidia looks how much better Cerebrus are, they are going to enter the market and sweep the rug under them, just like 40 years ago IBM decided to enter PC market and a lot of PC makers went belly up by 1983, unless they made IBM PC compatible clones (Apple was the only one without a IBM clone and barely survived).

high_derivative · on Aug 21, 2021

NVIDIA is going to do that either way. Huang is not asleep at the wheel.

riazrizvi · on Aug 21, 2021

So currently chips are printed into regions that are limited in size by physics, to a 3cm square. And processing power is traditionally increased by stacking them in upto a dozen layers that are interconnected at the edges.

Here, instead of that, the circuits are overlap-printed so that a single wafer can support a set of 80 connected circuits, which are now physically cooleable because of the flat design? While they must be sacrificing some interconnection richness, because of geometrical placement, for AI applications this probably doesn’t matter so much. Very interesting.

bsder · on Aug 22, 2021

> So currently chips are printed into regions that are limited in size by physics, to a 3cm square.

It's not quite that direct a limit. While performance almost always wants bigger chips there are counterbalancing forces (yield and heat dissipation) that want smaller chips.

Chip manufacturing is (roughly) limited by defects until you reach the physical size of a wafer. The larger your chip the more likely that some critical circuit hits/has a defect and fails. For example, if a wafer has 10 defects, but you are producing 1500 chips then at worst you will get a yield of 99%. If you are producing 100 bigger chips, you may get a yield of 90%. If you are only producing 10 (big!) chips, you may get a yield of 0%. This drops as the square of the chip dimension.

After you manufacture the chip, chip size is limited by power distribution and heat dissipation. The bigger your chip the more it needs of both until you can't supply it or cool it anymore. This is why "3D" chips don't really win--getting power in and heat out scales as surface area--not volume.

thethethethe · on Aug 21, 2021

> In a big cluster, as many as forty-eight pizza-box-size servers slide into a rack as tall as a person; these racks stand in rows, filling buildings the size of warehouses. The neural networks in such systems can tackle daunting problems, but they also face clear challenges. A network spread across a cluster is like a brain that’s been scattered around a room and wired together. Electrons move fast, but, even so, cross-chip communication is slow, and uses extravagant amounts of energy.

Why wouldn't these giant chips be wired together into a cluster too?

Nevermark · on Aug 21, 2021

I am sure it will happen, but the cost effectiveness and efficiency will drop so dramatically going from 1 wafer to just 2 wafers.

1 wafer, doing X work, in Y time

= 1 wafer, doing 2X work, in 2Y time

= Two wafers, doing 2X work, in something still close to 2Y time

I.e. the slowness of between-wafer communication, vs. in-wafer communication, will dwarf the computing time. Obviously there is some N, where N wafers would be worth clustering, but it might be quite high.

Maybe the company is working on ways to cut down cross-wafer communication too. Vertical optical connections for instance would be awesome.

bloopernova · on Aug 21, 2021

When I saw how large that chip was, I immediately thought of cooling such a beast.

Can any materials scientists or engineers comment on if other elements will withstand higher heat better than silicon? Seems like such a large chip would be somewhat better to run at higher temperature rather than budget for huge and elaborate cooling. (This is very much a layman's question. The people who designed the chip and its cooling are far, far smarter than me!)

namibj · on Aug 21, 2021

GaN-on-SiC and native SiC both support far higher temperatures, with native SiC lasting thousands of hours even at 500C Tj [0] and commercial GaN-on-SiC being rated for e.g. 225C Tjmax [1].

[0]: https://de.wikipedia.org/wiki/Siliciumcarbid#cite_ref-22 [1]: CREE Wolfspeed's CGHV1J070D ; datasheet: https://cms.wolfspeed.com/app/uploads/2020/12/CGHV1J070D.pdf

petermcneeley · on Aug 21, 2021

At such large die scales and high temperatures heat engines become practical as a means of both cooling and also extracting work. I wonder if there is serious research in this area?

namibj · on Aug 22, 2021

IIRC they become less efficient at higher temperatures, as the thermal-induced noise increases (might be as part of sub-threshold leakage; I don't remember the details).

There are considerable efficiency gains from running silicon CMOS at LN2 temperatures instead of room temperature, but the benefits fall apart once you realize you'll have to heat-pump the electrical consumption from 77K to room temperature. Main benefits would be being able to run them faster, and a good part of the optimization would need lower dopant concentrations in the transistor channels to properly take advantage of the low temperature, which unfortunately rules out common shared-wafer prototyping runs (so testing this IRL isn't really accessible).

Robotbeat · on Aug 21, 2021

Silicon Carbide wafers can withstand higher temperatures, which makes dumping heat easier.

prvc · on Aug 21, 2021

Based on a high-school level understanding, the cooling requirements would be just be proportional to the surface area, nothing special. Maybe there's an added risk of physical fissures developing, but that's hard to know a priori.

Robotbeat · on Aug 21, 2021

Surface area and also heat rejection temperature.

boshomi · on Aug 21, 2021

This chip need about 21 KW energy, enough to heat a house in Middle Europe.

lisper · on Aug 21, 2021

Or enough to cool one in the U.S.

rootusrootus · on Aug 21, 2021

That should cool a half dozen decently big homes.

sandworm101 · on Aug 21, 2021

Water cooling would handle this without issue. No need for fancy tricks. A big heat spreader and some 2" piping would be more than enough.

codeflo · on Aug 21, 2021

The article gives a figure of 15 kW for the chip. That’s the kind of heat usually generated by a small room full of servers. Radiating that away on the outside is not the main issue, solutions exist for that. But getting that kind of heat away from the chip and into the water in the first place has to be a nontrivial challenge.

MauranKilom · on Aug 21, 2021

The chip is approximately 21 cm x 21 cm (that's 8.5 inches).

My kettle has 2 kW and doesn't take long to boil water from room temperature. I reckon you could fit four such kettles on the chip area (roughly). That means the chip would roughly boil water two times as fast as my kettle, were it used for that purpose.

While that does pose reasonably interesting engineering challenges regarding coolant throughput etc., I don't think there's anything particularly difficult there. You probably would want a better heat transfer medium between chip and water than my kettle has (well, I have not disassembled it), but I agree with GP that a bunch of water pipes will work well enough as a cooling solution.

Edit: Actually, screw it, we can calculate how much water we need to put through there. Warming water from 20 to 100 °C takes 334 kJ/kg. (That comes out to 167 seconds to heat 1l of water in a 2 kW kettle, for reference.) To remove 15 kW of heat with water cooling, assuming the water goes in at 20 and comes out at 100 °C, we need a throughput of 0.045 kg/s = 45 g/s = 45 ml/s.

Sure, the temperature range may be a bit optimistic, but 45 ml/s (one liter every 22 seconds) is literally "just hold it under a running tap". The main engineering challenge would be making sure that heat is removed evenly enough, I guess.

Nevermark · on Aug 21, 2021

As the previous comment pointed out, its not the net energy that needs to be removed that is the problem.

I.e. its like a kettle that isn't just radiating the stove coil's energy away, it is actually trying to keep your stove coil cool while its turned up to 10!

The same amount of energy movement, but not the same problem at all.

hulitu · on Aug 22, 2021

Now the question is what percent of that energy is used by the chip. I understand it is a nice kettle but will be nice to see what can do in terms of computation.

xmcqdpt2 · on Aug 21, 2021

Yes the whitepaper talks a lot about this cooling.

https://f.hubspotusercontent30.net/hubfs/8968533/Cerebras-CS...

The traditional computer included in the box is probably quite high end and power hungry too so that it can provide enough data to maintain those bandwidths. They don't appear to sell the chips by themselves.

I think the comparison is with an equivalent gpu cluster like the nvidia DGX systems or HPC CPU nodes. The DGX A100 is 6.5kW for example,

https://images.nvidia.com/aem-dam/Solutions/Data-Center/nvid...

The Cerebras system fits 15 rack units which is more than 2x larger than the DGX (6.5U). A similar 15 node HPC server with CPU is probably not that far from 15kW either (2 socket per node, 250W per CPU is already 7.5kW, then add RAM etc.) so by HPC standards it's less "full room of servers" than "single cabinet".

willis936 · on Aug 21, 2021

If the thermal interfaces are done correctly then you can put an absurd flow rate of coolant across the fins and bring it out to a huge heat exchange system.

The trick isn't total power, it's power density on the die. In that regard, I don't think this is pushing the boundaries. It just needs custom built interfaces.

fortran77 · on Aug 21, 2021

The article talks about the need for special allows that have minimal expansion. There is 15 kilowatts of energy to dissipate in a very small space and the chip really cant expand or contract differently from the cooling block This seems like a hard problem to solve.

delaaxe · on Aug 21, 2021

Looks a bit like Tesla's new Dojo AI chip

Robotbeat · on Aug 21, 2021

I don’t think that’s an accident. Tesla undoubtedly took inspiration from Cerebras. Also, I think some TSMC processes enable this kind of wafer-scale chip, and both companies use this.

kken · on Aug 21, 2021

I may be mistaken, but Teslas Dojo chip still seems to be relatively small. They can connect many of them into a 2D fabric, though.

Cerebras still seems to have an advantage here, because they can use on-chip interconnects, which potentially allows higher bandwidth between the tiles.

handol · on Aug 21, 2021

I think the Tesla "Tiles" of 25 D1 chips are on a single wafer with integrated interconnects. But there is certainly a huge difference in the memory bandwidth claims. Cerebras claims 20PB/s, and Tesla claims 10Tbps.

Robotbeat · on Aug 21, 2021

From what I understand, both Tesla and Cerebras use TSMC’s on-wafer fan out technology.

actually_a_dog · on Aug 21, 2021

I know this is way OT, but the title immediately made me think of this: http://web.cecs.pdx.edu/~harry/Relay/index.html

rurban · on Aug 22, 2021

Sounds like a https://www.parallella.org/ to me. A grid SoC. Just bigger

fortran77 · on Aug 21, 2021

I didn't know Gene Amdahl killed someone. According to the NY Times he actually was convicted of manslaughter.

Considering that he ruined someone's life, why is he revered in computing circles?

lproven · on Aug 22, 2021

Huh. I did not know that. He hit and killed a motorcyclist while driving his Rolls Royce.

https://www.mercurynews.com/2015/11/14/gene-amdahl-father-of...

As a motorcyclist who was hit and very nearly killed by a bad driver, that has quite some, er, impact upon me. :-/

fortran77 · on Aug 22, 2021

Killing someone won't get you canceled in Silicon Valley. (He was found guilty (via no-contest plea) of manslaughter)

Just don't write a popular well-reviewed book that mentions in two sentences how dating in Silicon Valley is different from back home -- you'll never work again!