Good info! I use an HPC with SLURM. 40k GPUs shared by hundreds of users. It wor...

macksd · 2024-07-12T15:18:05 1720797485

If you do, in fact, need H100s, they can be very hard to get. Even the smaller flavors of A100 you sometimes request, wait days for, and then 1 node might show up during a weekend. And for the reasons described in the article and the fact that large training jobs can be network-limited, nicer networks can be a big deal.