No, the problem is that with training, you *do* care about latency, and you need... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		ronsor 3 days ago \| parent \| context \| favorite \| on: SimpleFold: Folding proteins is simpler than you t... No, the problem is that with training, you do care about latency, and you need a crap-ton of bandwidth too! Think of the all_gather; think of the gradients! Inference is actually easier to distribute.

meehai 3 days ago [–]

Yeah, but if you can do topologies based on latencies you may get some decent tradeoffs. For example with N=1M nodes each doing batch updates in a tree manner, i.e the all reduce is actually layered by latency between nodes.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact