Edit: gigabits indeed. Confusing, my old M2 Max has 400 GB/s (3200 gigabits per second) bandwidth. I guess it's some sort of baseline figure for the lowest end configuration?
Edit 2: 1,224 Gbps equals 153 GB/s. Perhaps M5 Max will have 153 GB/s * 4 = 612 GB/s memory bandwidth. Ultra double that. If anyone knows better, please share.
But what did the base M3 have? Why compare to different categories?
Edit: Apparently 100GB/s, so a 1.5x improvement over the M3 and a 1.25x improvement over the M4. That seems impressive if it scales to Pro, Max and Ultra.
High end gaming computers have far more memory bandwidth in the GPU, though. The CPU doesn’t need more memory bandwidth for most non-LLM tasks. Especially as gaming computers commonly use AMD chips with giant cache on the CPU.
The advantage of the unified architecture is that you can use all of the memory on the GPU. The unified memory architecture wins where your dataset exceeds the size of what you can fit in a GPU, but a high end gaming GPU is far faster if the data fits in VRAM.
Right, but high-end gaming GPUs exceed 1000GB/s and that's what you should be comparing to if you're interested in any kind of non-CPU compute (tensor ops, GPU).
Perhaps they're worried that if they make the memory bandwidth too good, people will start buying consumer apple devices and shoving them into server racks at scale.
Nvidia DGX Spark has 273 GB/s (2184 gigabits with your units) and people are saying it's a disappointment because that's not enough for good AI performance with large models. All the neural accelerators in the world won't make it competitive in speed with discrete GPUs that all have way more bandwidth.
> All the neural accelerators in the world won't make it competitive in speed with discrete GPUs that all have way more bandwidth.
That’s true for the on-GPU memory but I think there is some subtlety here. MoE models have slimmed the difference considerably in my opinion, because not all experts might fit into the GPU memory, but with a fast enough bus you can stream them into place when necessary.
But the key difference is the type of memory. While NVIDIA (Gaming) GPUs ship with HBM memory ship for a while now, the DGX Spark and the M4 use LPDDR5X which is the main source for their memory bottleneck. And unified memory chips with HBM memory are definitely possible (GH200, GB200), they are just less power efficient on low/idle load.
NVIDIA Grace sidestep: They actually use both HBM3e (GPU) and LPDDR5X (CPU) for that reason (load characteristics).
The moat of the memory makers is just so underrated…