> The original version only supports *.wav audio files with 16kHz sample rate
This particular point is not true (at least not fully). The version publicly announced in 2022 had ffmpeg dependency for supporting any audio-containing format. For Windows I had just to drop the binary in Python script folder and enjoy converting from anything.
GP asked about the difference between whisper.cpp and my version, not OpenAI’s implementation and my version. By “the original version” in that paragraph I meant whisper.cpp.
On a general note, I believe using ffmpeg or gstreamer on Windows is sloppy software engineering. Media Foundation is a part of the OS and is supported by Microsoft.
For software which deals with video (as opposed to just audio) it’s even more important because GPU vendors directly supporting MF. While installing their GPU drivers, they also installing DLLs which expose their hardware codecs as media foundation transforms. Examples of such transforms are NVIDIA H.264 Encoder MFT, NVIDIA HEVC Encoder MFT, AMD D3D11 Hardware MFT Playback Decoder, and AMDh265Encoder.
This particular point is not true (at least not fully). The version publicly announced in 2022 had ffmpeg dependency for supporting any audio-containing format. For Windows I had just to drop the binary in Python script folder and enjoy converting from anything.