The GPU memory allocation refers to how capacity is alloted, not bandwidth. Sounds like the same 256-bit/quad-channel 8000MHz lpddr5 you can get today with Strix Halo.
384GB is 75% of 512GB. The M3 Ultra bandwidth is over 800GB/s, though potentially less in practice.
Using an M3 Ultra I think the performance is pretty remarkable for inference and concerns about prompt processing being slow in particular are greatly exaggerated.
Maybe the advantage of the DGX Spark will be for training or fine tuning.
I very consistently see people say prompt processing is slow for larger context sizes ("notoriously slow"), something that is much less of an issue with eg CUDA setups.