Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What I mean is that AI training can be bottlenecked by internode communication speed, as GPUs sit idle while weights are swapped around. Other applications (e.g. graphics rendering) are “embarrassingly parallel” and don’t even use the internode communication. Most applications lie somewhere in between.


I think as a rule of thumb, latest generation hardware makes the most sense for cloud compute providers supporting AI. But while yes, training can be bottlenecked easily by internode training speed, it really depends on the model, model size, how you’re doing the training (DDP? FSDP? Custom sharding?). I’ve seen bottlenecks there, but usually it’s on the latency side of things, and you won’t really see an improvement there moving from Infiniband HDR to NDR. Preparing hardware for a generic cluster with changing or unknown workloads, yeah max it out and build the most flexible fabric you can. But if you know your model, you can optimize your hardware to it.


This is one of the many reasons why systems used for mining crypto did not transition into an AI role. The hardware requirements were totally different.

I had a hard time convincing my rather non-technical bosses of this in my previous company.


Yeah those crypto cards were nerfed. Truly e-waste, I’d love for someone to figure out some innovative ways to use them, maybe with some unhinged hardware mods.


It isn't just the cards, it is the whole chassis. We actually had the cards specifically manufactured from older chips that were just sitting in a warehouse somewhere. They didn't have fans or display ports.

In reality it was everything about the system though... CPU, PSU, mobo, ram, disk, switches, cables. It was all focused on ROI, not quality or performance.

I spent years looking for alternative uses for them and came up empty handed. At one point, we had 20,000 PS5 APU chip blades in production (and another ~30k sitting in boxes). I found a professor scientist who could use them to needle in haystack searching for quasars. We did some small testing and if we had been able to find funding to power them for a couple months, it was Nobel worthy research.

Sadly, the company shut down, I was laid off, and I have no idea what happened to it all.


Wow, thanks for sharing. Truly tragic. I wonder where all the parts are sitting now, hoping it’s not just e-waste, but knowing that likely is the case. A shame about the quasar research, that would have been a wonderful project to read about.


100% e-waste, I'm sure. I also talked to a number of firms that will buy whatever you have, for pennies on the dollar, and then deal with it for you. Apparently things like memory chips can be unsoldered recycled pretty well.

This is another reason why I'm going with Dell these days, they have a program for recycling. Even if it isn't perfect, at least it is something...

https://www.dell.com/en-us/lp/dt/recovery-recycling-services




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: