Hacker News new | past | comments | ask | show | jobs | submit login

Just curious, (I have no idea how GPU stats influence neural network benchmarks), would slapping a 1080ti alongside my 3060ti gain me anything? Can I 'cluster' VRAM for better performance? Can we top ~5x transcribe speeds with more VRAM?

I'm open to the idea of buying an additional old gen GPU that nails a good price/VRAM ratio




> I have no idea how GPU stats influence neural network benchmarks

I don’t have any idea either, I don’t do ML stuff professionally. On my day job I’m using the same tech (C++, SSE and AVX SIMD, DirectCompute) for a CAM/CAE application.

> would slapping a 1080ti alongside my 3060ti gain me anything

In the current version of my library, you’ll gain very little. You’ll probably get the same performance as on my computer.

I think it should be technically possible to split the work to multiple GPUs. The most expensive compute shaders in that library, by far, are computing matrix*matrix products. When each GPU has enough VRAM to fit both input matrices, the problem is parallelizable.

However, that’s a lot of work, not something I’m willing to do within the scope of that project. Also, if you have multiple input streams to transcribe, you’ll get better overall throughput processing these streams in parallel on different GPUs.

> I'm open to the idea of buying an additional old gen GPU that nails a good price/VRAM ratio

Based on my observations from the tests https://github.com/Const-me/Whisper/blob/master/SampleClips/... and also this thread about 3060Ti, it looks like the library is indeed bound by VRAM, not compute.

I have another data point, that commit https://github.com/Const-me/Whisper/commit/062d01a9701a11468... Same AMD iGPU, the only difference is BIOS setup, I switched the memory from the default DDR4-2400T into the faster XMP-3332 mode.

If you can, try on Radeon RX 6700 XT, or better ones from that table: https://en.wikipedia.org/wiki/Radeon_RX_6000_series#Desktop The figure for VRAM bandwidth is “only” 384 GB/sec, but the GPU has 96 MB L3 cache, which might make a difference for these compute shaders. That’s pure theory though, I haven’t tested on such GPUs. If you do that, make sure to play with the comboboxes on the “Advanced GPU Settings” dialog in the desktop example.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: