Our cluster is 128 GPUs into a single Dell switch... should help with the queuing. We also have a separate e-w 100G network.
This is why we went with Dell XE9680 chassis... people forget that PCI switches are quite important with this level of compute. Dell has done a good job here.
Our cluster is 128 GPUs into a single Dell switch... should help with the queuing. We also have a separate e-w 100G network.
This is why we went with Dell XE9680 chassis... people forget that PCI switches are quite important with this level of compute. Dell has done a good job here.