Their A3B Omni paper mentions that the Omni at that size outperformed the (unrel...

Their A3B Omni paper mentions that the Omni at that size outperformed the (unreleased I guess) VL. Edit: I see now that there is no Omni-235B-A22B; disregard the following. ~~Which is interesting - I'd have expected the larger model to have more weights to "waste" on additional modalities and thus for the opposite to be true (or for the VL to outperform in both cases, or for both to benefit from knowledge transfer).~~

Relevant comparison is on page 15: https://arxiv.org/abs/2509.17765