The native FP4 is one of the most interesting architectural aspects here IMO, as going below FP8 is known to come with accuracy tradeoffs. I'm curious how they navigated this and how the FP8 weights (if they exist) were to perform.
One thing to note is that MXFP4 is a block scaled format, with 4.25 bits per weight. This lets it represent a lot more numbers than just raw FP4 would with say 1 mantissa and 2 exponent bits.