Infiniband and ethernet are very different at the lowest levels. Ethernet interc...

latchkey · 2024-07-12T17:16:45 1720804605

This is a great comment.

Our cluster is 128 GPUs into a single Dell switch... should help with the queuing. We also have a separate e-w 100G network.

This is why we went with Dell XE9680 chassis... people forget that PCI switches are quite important with this level of compute. Dell has done a good job here.

eigenvalue · 2024-07-12T19:47:04 1720813624

Interesting, thanks. From the wikipedia link, this seems like the probable culprit for why things break:

"Although in general the delivery order of UDP packets is not guaranteed, the RoCEv2 specification requires that packets with the same UDP source port and the same destination address must not be reordered."

wmf · 2024-07-12T20:31:01 1720816261

In practice it's easy to design an Ethernet network that doesn't reorder packets.

eigenvalue · 2024-07-13T00:21:25 1720830085

But if packets have to go over the internet at some point, aren't all bets off if you use UDP?

wmf · 2024-07-13T00:47:40 1720831660

ROCE doesn't go over the Internet, most ISPs don't reorder packets, and UDP isn't treated specially.

shaklee3 · 2024-07-13T04:57:08 1720846628

RoCE does not encapsulate infiniband

rlupi · 2024-07-13T19:47:33 1720900053

You are right. Sorry, I quoted the linked article. I haven't worked on the networking side to that level of detail.