The first SoC including Neural Engine was the A11 Bionic, used in iPhone 8, 8 Plus and iPhone X, introduced in 2017. Since then, every Apple A-series SoC has included a Neural Engine.
The Neural Engine is its own block. Neural Engine is not used for local LLMs on Macs. Neural Engine is optimized for power efficiency while running small models. It's not good for LARGE language models.
This change is strictly adding matmul acceleration into each GPU core where it is being used for LLMs.
The NPU is still there. This adds matmul acceleration directly into each GPU core. It takes about ~10% more transistors to add these accelerators into the GPU so it's a significant investment for Apple.