Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A unified memory bandwidth of 1,224 gigabits per second is quite impressive.


Probably gigabytes (GB) and not gigabits (Gb)?

Edit: gigabits indeed. Confusing, my old M2 Max has 400 GB/s (3200 gigabits per second) bandwidth. I guess it's some sort of baseline figure for the lowest end configuration?

Edit 2: 1,224 Gbps equals 153 GB/s. Perhaps M5 Max will have 153 GB/s * 4 = 612 GB/s memory bandwidth. Ultra double that. If anyone knows better, please share.


why? M3 Ultra already had 800 GB/s (6400 gbps) memory bandwidth


But what did the base M3 have? Why compare to different categories?

Edit: Apparently 100GB/s, so a 1.5x improvement over the M3 and a 1.25x improvement over the M4. That seems impressive if it scales to Pro, Max and Ultra.


And that was already impressive. High-end gaming computers with dual-channel DDR5 only reach ~100GB/s of CPU memory bandwidth.


High end gaming computers have far more memory bandwidth in the GPU, though. The CPU doesn’t need more memory bandwidth for most non-LLM tasks. Especially as gaming computers commonly use AMD chips with giant cache on the CPU.

The advantage of the unified architecture is that you can use all of the memory on the GPU. The unified memory architecture wins where your dataset exceeds the size of what you can fit in a GPU, but a high end gaming GPU is far faster if the data fits in VRAM.


The other advantage is you don’t have to transfer assets across slow buses to get it into that high speed VRAM.


Right, but high-end gaming GPUs exceed 1000GB/s and that's what you should be comparing to if you're interested in any kind of non-CPU compute (tensor ops, GPU).


And you can find high-end (PC) laptops using LPDDR5x running at 8533 MT/s or higher which gives you more bandwidth than DDR5.


I was looking at that number and thinking opposite - that's oddly slow at least in context of new apple chip.

Guessing that's their base tier and it'll increase on the higher spec/more mem models.


Perhaps they're worried that if they make the memory bandwidth too good, people will start buying consumer apple devices and shoving them into server racks at scale.


Nvidia DGX Spark has 273 GB/s (2184 gigabits with your units) and people are saying it's a disappointment because that's not enough for good AI performance with large models. All the neural accelerators in the world won't make it competitive in speed with discrete GPUs that all have way more bandwidth.


> All the neural accelerators in the world won't make it competitive in speed with discrete GPUs that all have way more bandwidth.

That’s true for the on-GPU memory but I think there is some subtlety here. MoE models have slimmed the difference considerably in my opinion, because not all experts might fit into the GPU memory, but with a fast enough bus you can stream them into place when necessary.

But the key difference is the type of memory. While NVIDIA (Gaming) GPUs ship with HBM memory ship for a while now, the DGX Spark and the M4 use LPDDR5X which is the main source for their memory bottleneck. And unified memory chips with HBM memory are definitely possible (GH200, GB200), they are just less power efficient on low/idle load.

NVIDIA Grace sidestep: They actually use both HBM3e (GPU) and LPDDR5X (CPU) for that reason (load characteristics).

The moat of the memory makers is just so underrated…




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: