Hacker News new | past | comments | ask | show | jobs | submit login

It's said that whisper.cpp can already run on Windows. What's the difference?



whisper.cpp runs on CPU. My version runs on GPU, because Windows includes a good vendor-agnostic GPU API, Direct3D. On my desktop computer, the performance difference between them is about an order of magnitude. My version is even twice as fast compared to the OpenAI’s original GPGPU implementation, which is based on PyTorch and CUDA.

The original version only supports *.wav audio files with 16kHz sample rate, my version supports most audio and video codecs with any sample rate, because Windows comes with a built-in APIs to decode audio and convert the audio between sample rates.

My version can capture audio directly from microphones, again because Windows comes with a Microsoft-supported API to deal with audio capture devices.


> The original version only supports *.wav audio files with 16kHz sample rate

This particular point is not true (at least not fully). The version publicly announced in 2022 had ffmpeg dependency for supporting any audio-containing format. For Windows I had just to drop the binary in Python script folder and enjoy converting from anything.


GP asked about the difference between whisper.cpp and my version, not OpenAI’s implementation and my version. By “the original version” in that paragraph I meant whisper.cpp.

On a general note, I believe using ffmpeg or gstreamer on Windows is sloppy software engineering. Media Foundation is a part of the OS and is supported by Microsoft.

For software which deals with video (as opposed to just audio) it’s even more important because GPU vendors directly supporting MF. While installing their GPU drivers, they also installing DLLs which expose their hardware codecs as media foundation transforms. Examples of such transforms are NVIDIA H.264 Encoder MFT, NVIDIA HEVC Encoder MFT, AMD D3D11 Hardware MFT Playback Decoder, and AMDh265Encoder.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: