AVX512 would be ~4x, but this intel CPU doesn't have it.
AVX2 is ~2x, Ryzen/AMD fakes AVX2 instructions with multiple SSE instructions.
Some AVX2 instructions downclock but not much, I see very close to 2x speedup over SSE2 with some workloads. Some of the downclock loss is made up for because there are more instructions available (gather, etc)
AVX512 might hit more than 4x improvement over SSE on some workloads despite the downclocks, due to all of the masking features. I have seen results consistent with this, 2nd hand. (I don't own an AVX512 cpu)
Anyway all of these things depend on workload, cpu, compiler etc. But it does happen!
> AVX512 would be 4x AVX2 is 2x, Ryzen fakes AVX2 instructions with multiple SSE instructions.
Not quite. Everything is fake, because everything is encoded into micro-ops first.
Ryzen's internal micro-op engine is 128-bit wide. But it has 4x pipes... each handling 128-bits at a time. So any 256 bit instruction will simply use two-pipes at a time.
-------
So the 256-bit instruction does in fact, execute at once.
The difference is that Intel has 3-pipelines, each of which can do a 256-bit instruction by itself.
-------
In effect: Ryzen is 4x128-bit pipelines, with the ability to merge pipelines together to do a 256-bit instruction.
Intel is 3 x 256-bit pipelines, with the ability (on Skylake-X) to merge pipelines together to do a 512-bit instruction
In any case, Intel has wider pipes than Ryzen. Intel Skylake is effectively a 256-bit CPU, while Ryzen is only a 128-bit CPU.
Cloudflare ran into some trouble with this: https://blog.cloudflare.com/on-the-dangers-of-intels-frequen...