> What does this mean? It probably means you flipped the combobox on the first s...

> What does this mean?

It probably means you flipped the combobox on the first screen. In the build on github, the only included model implementation is GPU. The other two implementations are disabled with macros, there: https://github.com/Const-me/Whisper/blob/1.1.0/Whisper/stdaf... These implementations are lacking some UX features like callbacks and cancellation, and I haven't tested them for a while, but they might still work.

> does this use both GPU and CPU simultaneously?

No, it's sequential. There's a data dependency between these two stages. The encode function computes some buffers (probably called "cross attention" but I'm not sure, not an ML expert), and then the decode function needs that data to generate the output text.