Hacker Newsnew | past | comments | ask | show | jobs | submit | FaradayRotation's commentslogin

In many ways I agree with you, but the problem statement (constrained/exhausted gas supply from vendor) makes it seems like this was not just line down, but the whole factory stopped for a few hours. Line down is a miserable migrane but still managable... while a whole factory stoppage makes a lobotomy seem like a good idea. It also sounds like there was not enough forewarning to park critical customer wafers in a "safe" stage of the process.

Even so, I also would still call this another monday at a semiconductor factory. Welcome! Here we play a nearly endless game of whack-a-mole. Here's your mallet and your towel. Now whack enough of the moles hard enough until they stop coming back (at least through the same holes). Beware the alpha moles.

By any road, I am surprised to see even this high-level perspective on a quality event disclosed to the mainstream public; I thought this was not standard practice. I enjoyed the read.


Just curious, would a full factory stoppage require recalibration or revalidation of certain equipment? Or is it more like an atmospheric issue that only affects the product.

I'm also curious. Its not like the power went out and machines unsafely shutdown though.

Sorry for the delay friend, I missed your message.

The number of issues that a semiconductor factory stoppage would cause stretches one imagination, worse if you cannot bring the material to a "safe" spot on the line. I will try to capture a few of them, off the top of my head.

As you alluded to, Contamination is the big one. You really need power to keep things clean. But also, the process that runs in the factory is just assumed by default to run all the time, and you optimize the process around that assumption. In a system with thousands of operations (and many suboperations within each operation), the process window is just too small to tolerate much deviance, and the process window is certainly not explored around a hard restart like this. We want to prevent it from running under these conditions at all!

Now for some more details:

- If your fab air handling/pumping system stops, particle counts will explode. This in turn causes killer defects on the process material.

- You also can't keep your tools evacuated at high vacuum / ultra-high vacuum levels (effectively, atomically pure). Pumping down to this level is not trivial and can take weeks of work to restore if the vacuum chamber is badly contaminated. Fab air is much better than the labs I used pumps in, but it is still a big job to keep these chambers pristine.

- Many tools are implicitly dependent on continuous operation and consumption of feedstock and workpieces (often called tool conditioning). For example, Letting a dry etch chamber idle means it will inevitably develop some kind of contamination layer over the previous chamber-wall conditioning layer. This can happen very fast (think ~30 min) even when the tool is idling under ideal conditions, and it often forces process module friends to run "dummy" conditioning wafers to manage the issue. Now imagine what might happen on non-ideal conditions.

- Feedstock / consumables can go bad very fast. There's wet and gaseous feedstocks trapped in the lines of every single tool, and most modules don't characterize what happens to the feedstock quality when the tool is shut down, at all. Related, I remember a story where a lab was having a terrible time replicating what was happening in a foundry due to particle contamination from wet cleans/etch. It turns out that the particulate was coming from the plastic jugs holding the wet chemistry. The root cause turned out to be the fab used that chemistry so much and so fast that the particulate contamination was never a problem, while the lab might have held the half-full jugs for months, causing plastic bits to build up in the chemistry.

- The engineers must prove that their tools/segments works as spec'd post restart. This is exhausting and painstaking work. Bringing tools back up to production in the course of normal operation is already tiresome enough. But you cannot just run critical material and hope for the best! SO now you must spend days validating the entire process line again.

- You can try to shelve / store key material to avert true disaster, but there are critical segments where this is impossible due to reactivity or sensitivity or whatever. You have a finite amount of time to get your material out of those high risk segments, and if the gas supplier only gives you an hour of forewarning, all that material might be totally screwed and there is virtually nothing you can do except cross your fingers. The material would likely be scrapped anyways since the risk is known to be too high to bother processing it further.

- There is also a finite amount of time where the wafers can spend in stores, even if they are pulled off the line in "safe" segments of the process. They will still collect particles, they will oxidize, surface quality will degrade as long as they are not in optimal conditions. Cleans are an option, but you must be sure those cleans do address the specific types of contaimination the wafers collected while in the stocks.

OK, that's what I could immediately think of off the top of my head in the time I have available. Hope that satiates your curiosity for the moment.


I nearly spit my drink out. This is my kind of humor, thanks for sharing.

I've had a decent experience (though not perfect) with identifying and understanding building codes using both Claude and GPT. But I had to be reasonably skeptical and very specific to get to where I needed to go. I would say it helped me figure out the right questions and which parts of the code applied to my scenario, more than it gave the "right" answer the first go round.


Almost the same happy story here on my end. I had an Ubuntu home server, but with windows as my main. Then Win 10 --> Win 11 hit. I was already annoyed at MS for many reasons, and then I realized just how much money I would need to spend on getting equivalent functionality, for an OS (Win 11) that I hated and had a dismal preview of at work.

Now I have Mint. With an AI terminal to help guide/teach me, I find myself really enjoying the power and capability the terminal gives me. My computer has been genuinely USEFUL for debugging serious problems with WiFi calling on my phone and other network connectivity issues, and I still get to play games on steam! Sure, there were one or two hiccups on Outward when I first started, but Bannerlord and everything else I've tried so far plays just fine. It just works! Really!


That's awesome! I did look at Mint as well, it looks like a great distro.

I was surprised by some of the extras I got from KDE Plasma just for no extra effort, like KDE Connect is amazing and something super painful on Windows and Mac.


This. A quick scan of the wikipedia page for diamond material properties suggests you are very correct. It appears very chemically inert, with some outstanding exceptions: "Resistant to acids, but dissolves irreversibly in hot steel"

https://en.wikipedia.org/wiki/Material_properties_of_diamond

Also, removed/liberated particles of Diamond from the workpiece which failed to fully chemically dissolve into the slurry would then contribute to the abrasive in the slurry. If the slurry abrasive was not also diamond, then that could lead to some serious scratch/gouging of the work surface.

Perhaps not insurmountable, but wow, that sounds like a stiff challenge, especially when accounting for cost.

I wonder if diamond would be machinable with a dry (plasma) etch instead? I am purely speculating here, this is far out of my wheelhouse. But SiO2 is already very chemically inert (though considerably softer vs diamond), but manufacturers regularly dry etch it.


Putting on my frowny-faced principal engineer hat: we need someone to do the calculation of cost of manufacturing vs the amount of money saved by increasing energy efficiency.


Before you put on your frowny-faced principal engineer hat, you should put on your reading glasses. Try reading the first statement I made again...

"Assuming this becomes easier and cheaper to do as the technique matures"

In other words, what I'm suggesting is a potential future use if the cost comes down.


Heh, my glasses were actually quite dirty when I wrote that.

More seriously: I did see that, and your idea is interesting! My intent was to communicate the minimum threshold we would need to hit to make that future a reality.


Oh man, the integrated problems this will cause for the manufacturing engineers will be of nightmare level. You wont really get to properly test how well you made the heat pipe network until end of line! Hopefully they will be able to drum up some inline metrology to test the heat pipes before then...

This on top of all the through-silicon-vias and backside power delivery would make even the crustiest of engineers weep...


These are gigantic and interesting questions packed into some pretty tiny boxes :) I will try to capture some of the issues involved.

Caveat: For older processes, built on a larger scale (>1 micron), these kinds of details may not matter, in which you are right to question this point. But if you want to implement on cutting edge manufacturing processes, these details absolutely do matter.

To put this in perspective, in cutting edge process nodes, I've seen senior engineers argue bitterly over ~1 nm in a certain critical dimension. That's (roughly) about 5 atoms across, depending on how much you trust the accuracy of the metrology.

So, if ANY layer isn't "flat" (or otherwise to spec within tolerance), the next layer in the semiconductor patterning stack will tend to translate that bumpiness upward, or cause a deformity in adjacent structure. This is (almost) always bad. These defects cause voids, bad electrical/thermal contacts and characteristics, misshapen/displaced structures, etc, etc

Crystallization in thin-film (especially conformal/gap-filling films) is a tough job which many poor PhD students have slaved over. Poly crystalline material is arguably harder to control in some key ways vs mono crystalline, since you don't have direct control the specific crystal grain orientation and growth direction. That is, some grain orientations will grow quickly, and others growing slowly. You can imagine the challenge then of getting the layer to terminate growth without ending up too jagged on the ~nm scale. After that you also get into the fun world of crystal defects, grain size, and deciding if you need to do some more post-processing (do I risk planarizing?)

Hopefully I have captured some of the pieces involved in an understandable way.

Edit: clarity


It is genuinely impressive to grow thin film polycrystalline diamond at 400C, but my understanding is this temperature is basically at the ceiling of what the circuits will tolerate in the course of manufacturing to still get a good quality device at end of line. Stress tests, anneals, and wafer bakes are usually limited to about 400C - unless the point is to deliberately degrade the chip

Not to say that it can't be done, only that the process window is not very large and the propensity for deleterious carbon soot is very high. Likely this will generate some very fun, highly integrated problem statements before we see this available for sale.

Getting heat out of the chip is such a painful and important struggle. I hope this works on a real process line. Too many benefits on the table to ignore.

Edit: Grammar, clarity


I wonder, in situations like the Raptor lake fiasco or other "overclocked a little too far" scenarios where the circuit degrades to the point the frequency must be reduced to maintain expected stability, that some very small spots on the chip approached that temperature, while the temp sensor read 100C or below (not kicking in thermal throttling when it should've)?


Caveats: My understanding of the Raptor Lake mess is pretty limited, mostly because Intel has been fairly closed lipped on what specific issue caused that. My personal suspicion is that it was a pareto plot's worth of issues. Also, while I do know a few things about this particular topic, I am far from the final authority on it.

My understanding is that point/local resistive heating problems out in the wild tend to drive different failure modes vs the global heating techniques used on the manufacturing line, mostly because the CPU is in active operation, which changes the defect physics. Put another way, likely any particular structure in the CPU would not need to reach 400C to fail - even the small voltages used in these chips coupled with elevated temperature can drive a lot of difficult-to-catch, slow-to-manifest failure modes. Copper metal migration is the classic example of this type of problem, where copper ions slowly migrate under voltage+temperature, causing/propagating voids until finally an open circuit is made. Surprise! your chip no longer works after seeming perfectly fine! Manufacturers try to catch such problems with simulated aging through aggressive temperature and voltage experiments. Intel must have discovered a big gap in their visibility, and then discovered their CPU specs were incompatible with the stated product lifetime without a major re-spec of already sold product. Ouch.

The chip manufacturer also has some capability to make repairs and adjustments ahead of end of line, which should encompass managing some of the issues you refer to. Some big customers might have their own repair capabilities.

Edit: Clarity, trying to better address the question


If growing diamonds is the thermal bottleneck of manufacturing processes, one could imagine a sci-fi future where rather than silicon wafers serving as base matrix material to grow ancillary structures upon, it would instead be diamond wafers that are used to subtractively etch structural scaffoldings, around which silicon-based structures are grown, the diamond scaffolding serving simultaneously as bone and blood vessels for thermal and power conduction as well as mechanical support.


Location: Western Oregon (West Salem)

Remote: Yes

Willing to relocate: No

Technologies: Data Analysis | Statistics | High-Tech Manufacturing | Photonics | Programming (Python, SQL, C#, JMP)

Résumé/CV: https://www.linkedin.com/in/marsh-aaron/

Email: [email protected]

Looking for: Associate/Mid-Level engineering/research role

Scientist and engineer with experience in manufacturing quality, data analytics, programming, and spectroscopy. Adept with data analysis and skilled programmer with practical knowledge of SQL, Python, and other languages and software tools. Highly proficient in detecting and validating critical signals from background in statistical process control and optics. Expert in statistics, designing experiments, and collecting data, drawing from background in semiconductor manufacturing development and spectroscopy of rare-earth-doped optical materials for quantum memories. Team player with strong verbal and written communication skills, able to influence peers through storytelling with data and communicate complex ideas to a variety of audiences. Experienced in leading projects, roadmapping solutions, and driving towards short and long-term goals. Thrives on continuous learning and motivated for self-improvement.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: