> The original version only supports \*.wav audio files with 16kHz sample rate T...

Const-me · on Jan 17, 2023

GP asked about the difference between whisper.cpp and my version, not OpenAI’s implementation and my version. By “the original version” in that paragraph I meant whisper.cpp.

On a general note, I believe using ffmpeg or gstreamer on Windows is sloppy software engineering. Media Foundation is a part of the OS and is supported by Microsoft.

For software which deals with video (as opposed to just audio) it’s even more important because GPU vendors directly supporting MF. While installing their GPU drivers, they also installing DLLs which expose their hardware codecs as media foundation transforms. Examples of such transforms are NVIDIA H.264 Encoder MFT, NVIDIA HEVC Encoder MFT, AMD D3D11 Hardware MFT Playback Decoder, and AMDh265Encoder.