Acid 2 bakes in the assumption that you will be displaying it on a desktop/laptop monitor with 100% scaling; It depends on pixel accuracy.
This was a reasonably universal assumption in 2005, but became less and less valid over time, we now have high-dpi screens and the whole idea of pixel accuracy has fallen out of favour (it was never a good idea, but 2005) as phone browsers are expected to rescale websites for better readability/usability.
The result is that Acid 2 fails on my phone, and on my laptop it will pass/fail depending on which screen the window is on.
Acid 3 was too forwards looking and rigid. While Acid 2 was (mostly) testing accepted standards (which IE6 implemented very poorly), Acid 3 tested a bunch of draft standards. It was very strict on many things that weren't well defined and later versions of the standards took the opposite approach.
Basically, Acid 2 was very good at shaming Microsoft into fixing Internet Explorer; But in the long run the whole concept of popular cherry picked torture tests proved to be of limited usefulness (and actually counterproductive) to promoting standards compliant browsers.
They no longer reflect what the average user expects their browser to support. You can pass it and miss on several important things that are considered widespread features nowadays.