Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Neither of those benchmarks is relevant to ML which is what GP is talking about


So what is relevant? As other poster said, nVidia tends to outpace AMD cards in pretty much every use case in real life.


The only important numbers are processing power TFlops, and memory bandwidth GB/s.

If your compute shader doesn’t approach the theoretical limit of either computations or memory, it doesn’t mean there’s anything wrong with the GPU. Here’s incomplete list of possible reasons.

● Insufficient parallelism of the problem. Some problems are inherently sequential.

● Poor HLSL programming skills. For example, a compute shaders with 32 threads/group wastes 50% of compute units of most AMD GPUs, the correct number for AMD is 64 threads/group, or a multiple of 64. BTW, nVidia and Intel are fine with 64 threads/group, they run 1 thread group as 2 wavefronts which does not waste any resources.

● The problem being too small to compensate for the overhead. For example, CPUs multiply two 4x4 matrices in a small fraction of a time it takes to dispatch a compute shader for that. You gonna need much larger matrices for GPGPU to win.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: