I’ve got an M3 Max with 64G, and can run larger models well than a single 5090. ...

angoragoats · 2025-12-04T22:48:09 1764888489

You illustrated my point exactly: yes, a single 32GB 5090 has half the memory of your Mac. But two of them (or three 3090/4090s) have the same total memory as your Mac, are in the same ballpark in price, and would be several times faster at running the same model as your Mac.

And before you bring up the “efficiency” of the Mac: I’ve done the math, and between the Mac being much slower (thus needing more time to run) and the fact that you can throttle the discrete GPUs to use 200-250W each and only lose a few percent in LLM performance, it’s the same price or cheaper to operate the discrete GPUs for the same workload.

seanmcdirmid · 2025-12-04T23:48:21 1764892101

I don't know. Can you bring your GPUs on an inter-continental plane trip and play with LLMs on the plane? It isn't really that slow for 70B 4-q models. These are very good CPU/GPUs, and they are only getting better.

angoragoats · 2025-12-05T00:07:13 1764893233

Sure, the GPUs sit in my basement and I can connect to them from anywhere in the world.

My point was not that “it isn’t really that slow,” my point is that Macs are slower than dedicated GPUs, while being just as expensive (or more expensive, given the specific scenario) to purchase and operate.

And I did my analysis using the Mac Studio, which is faster than the equivalent MBP at load (and is also not portable). So if you’re using a MacBook, my guess is that your performance/watt numbers are worse than what I was looking at.

seanmcdirmid · 2025-12-05T01:12:15 1764897135

The whole point of having it local is not to use the network, or not need it, or not needing to jump the GFW when you are in China.

Ultra is about 2X of the power of a Max, but the Max itself is pretty beefy, and it has more than enough GPU power for the models that you can fit into ~48GB of RAM (what you have available if you are running with 64GB of memory).

angoragoats · 2025-12-05T12:13:54 1764936834

If you travel to China, sure, what I’m talking about probably won’t work for you.

In pretty much any other situation, using dedicated GPUs is 1) definitely faster, like 2x the speed or more depending on your use case, and 2) the same cost or possibly cheaper. That’s all I’m saying.