Since you identify an instrument or voice by its formants (which are more or less at a fixed frequency), it's unlikely to yield good results over such a large range.
I disagree. Sure, a naive approach wouldn't work (shift everything), but everyone's voice covers multiple octaves, so I'm sure plenty of people already know what changes need to happen if you sing in C2 but want to transpose it to C4, etc.
Of course there's some knowledge about that, but the approach in the link identifies the pitch by an NN (this step is not relevant to the current discussion) and then applies an FFT based method for pitch shifting that doesn't take any of it into account. So it'll shift formants as well, making voices and instruments change their character substantially.