CPU-specific optimization has waxed and waned in importance over the years. It used to be that a lowest common denominator compilation would not even include MMX or SSE, which could make a huge difference for some kinds of CPU-bound algorithms. However, it was always possible to do a runtime CPU feature detection and run the optimal version, so compile time feature selection was not the only option.
Then AMD Opteron came out and got everyone on a new baseline: if you compiled for amd64 then that meant a specific CPU again, there were no newer instructions yet. Now AMD64 has several levels[1] which can start to matter for certain kinds of operations, such as SIMD. A more mundane one I often run into is that v2 added popcnt which is great for things like bitsets in isolation, but in overall program performance I measure almost no difference on any of my projects between v1 and v3.
When it comes to games, it's more than likely your games were binary-only and already compiled for the lowest common denominator and maybe used runtime feature selection, and even then, they were probably GPU-bound anyway.
Then AMD Opteron came out and got everyone on a new baseline: if you compiled for amd64 then that meant a specific CPU again, there were no newer instructions yet. Now AMD64 has several levels[1] which can start to matter for certain kinds of operations, such as SIMD. A more mundane one I often run into is that v2 added popcnt which is great for things like bitsets in isolation, but in overall program performance I measure almost no difference on any of my projects between v1 and v3.
When it comes to games, it's more than likely your games were binary-only and already compiled for the lowest common denominator and maybe used runtime feature selection, and even then, they were probably GPU-bound anyway.
[1] https://en.wikipedia.org/wiki/X86-64#Microarchitecture_level...