I thought that the implication was that the shader compiler produces a second shader from the same source that went through a dead code elimination pass which maintains only the code necessary to calculate the position, ignoring other attributes.
Sure, but that only goes so far, especially when users aren't writing their shaders with knowledge that this transform is going to be applied or any tools to verify that it's able to eliminate anything.
Well, it is what is done on several tiler architectures, and it generally works just fine. Normally your computations of the position aren't really intertwined with the computation of the other outputs, so dead code elimination does a good job.