Neither of those benchmarks is relevant to ML which is what GP is talking about

izacus · on Dec 18, 2023

So what is relevant? As other poster said, nVidia tends to outpace AMD cards in pretty much every use case in real life.

Const-me · on Dec 21, 2023

The only important numbers are processing power TFlops, and memory bandwidth GB/s.

If your compute shader doesn’t approach the theoretical limit of either computations or memory, it doesn’t mean there’s anything wrong with the GPU. Here’s incomplete list of possible reasons.

● Insufficient parallelism of the problem. Some problems are inherently sequential.

● Poor HLSL programming skills. For example, a compute shaders with 32 threads/group wastes 50% of compute units of most AMD GPUs, the correct number for AMD is 64 threads/group, or a multiple of 64. BTW, nVidia and Intel are fine with 64 threads/group, they run 1 thread group as 2 wavefronts which does not waste any resources.

● The problem being too small to compensate for the overhead. For example, CPUs multiply two 4x4 matrices in a small fraction of a time it takes to dispatch a compute shader for that. You gonna need much larger matrices for GPGPU to win.