That latency is in my code, not in some Windows component. I’m accumulating seve...

CrimsonCape · on Jan 16, 2023

Thanks I understand now. You needed to send buffered audio because the model wasn't handling short snippets well.

Do you have a sample audio clip you would like to add to the repo for benchmarking purposes? I'm going to try it on my 3060ti tonight and could compare times...

Const-me · on Jan 16, 2023

I have uploaded two sample clips: https://github.com/Const-me/Whisper/tree/master/SampleClips

The text files in that folder contain performance data from two computers: a desktop with nVidia 1080Ti, and a laptop with integrated AMD GPU.

If you want just a single number, look at the “RunComplete” value in these text files.

CrimsonCape · on Jan 17, 2023

Here are my benchmarks on a 3060Ti, i7 12700k:

Columbia Medium EN: 21.56 seconds

Columbia Large: 37.5 seconds

JFK Medium EN: 1.89 seconds

JFK Large: 3.25 seconds

Seems like your optimizations for your native hardware are really good!

Const-me · on Jan 17, 2023

Wikipedia says there're two versions of 3060Ti, one has GDDR6 memory with 448 GB/second bandwidth, another one has GDDR6X memory with 608 GB/second bandwidth: https://en.wikipedia.org/wiki/GeForce_30_series#Desktop

The GDDR5X VRAM in 1080Ti delivers up to 484 GB/second.

I wonder whether are you using GDDR6 or 6X version of 3060Ti?

CrimsonCape · on Jan 17, 2023

Founder's Edition, so according to this site,

https://www.techpowerup.com/gpu-specs/nvidia-geforce-rtx-306...

it's the 6X version.

Const-me · on Jan 17, 2023

Here's another founders edition on that web site, with GDDR6 memory:

https://www.techpowerup.com/gpu-specs/nvidia-geforce-rtx-306...

They have a tool to find out for sure: https://www.techpowerup.com/download/techpowerup-gpu-z/

CrimsonCape · on Jan 18, 2023

Yes, the page I quoted is wrong. Founder's Edition is the lower range GDDR6, 8gb.

Const-me · on Jan 18, 2023

That’s what I thought, and I think we have our answer. Apparently, these compute shaders are memory bound on our two GPUs, and 1080Ti has faster VRAM.

CrimsonCape · on Jan 20, 2023

Just curious, (I have no idea how GPU stats influence neural network benchmarks), would slapping a 1080ti alongside my 3060ti gain me anything? Can I 'cluster' VRAM for better performance? Can we top ~5x transcribe speeds with more VRAM?

I'm open to the idea of buying an additional old gen GPU that nails a good price/VRAM ratio

Const-me · on Jan 20, 2023

> I have no idea how GPU stats influence neural network benchmarks

I don’t have any idea either, I don’t do ML stuff professionally. On my day job I’m using the same tech (C++, SSE and AVX SIMD, DirectCompute) for a CAM/CAE application.

> would slapping a 1080ti alongside my 3060ti gain me anything

In the current version of my library, you’ll gain very little. You’ll probably get the same performance as on my computer.

I think it should be technically possible to split the work to multiple GPUs. The most expensive compute shaders in that library, by far, are computing matrix*matrix products. When each GPU has enough VRAM to fit both input matrices, the problem is parallelizable.

However, that’s a lot of work, not something I’m willing to do within the scope of that project. Also, if you have multiple input streams to transcribe, you’ll get better overall throughput processing these streams in parallel on different GPUs.

> I'm open to the idea of buying an additional old gen GPU that nails a good price/VRAM ratio

Based on my observations from the tests https://github.com/Const-me/Whisper/blob/master/SampleClips/... and also this thread about 3060Ti, it looks like the library is indeed bound by VRAM, not compute.

I have another data point, that commit https://github.com/Const-me/Whisper/commit/062d01a9701a11468... Same AMD iGPU, the only difference is BIOS setup, I switched the memory from the default DDR4-2400T into the faster XMP-3332 mode.

If you can, try on Radeon RX 6700 XT, or better ones from that table: https://en.wikipedia.org/wiki/Radeon_RX_6000_series#Desktop The figure for VRAM bandwidth is “only” 384 GB/sec, but the GPU has 96 MB L3 cache, which might make a difference for these compute shaders. That’s pure theory though, I haven’t tested on such GPUs. If you do that, make sure to play with the comboboxes on the “Advanced GPU Settings” dialog in the desktop example.