Ugh, why is Apple the only one shipping consumer GPUs with tons of RAM? I would ...

jauntywundrkind · 2025-07-23T01:07:27 1753232847

Intel already has a great value GPU. Everyone wants them to disrupt the game, destroy the product niches. It's general purpose compute performance is quite ass alas but maybe that doesn't matter for AI?

I'm not sure if there are higher capacity gddr6 & 7's rams to buy. I semi doubt you can add more without more channels, to some degree, but also, AMD just shipped R9700 based on rx9070 but with double the ram. But something like Strix Halo, an API with more lpddr channels could work. Word is that Strix Halo's 2027 successor Medusa Halo will go to 6 channels and it's hard to see a significant advantage without that win; the processing is already throughput constrained-ish and a leap on memory bandwidth will definitely be required. Dual channel 128b isn't enough!

There's also MRDIMMs standard, which multiplexes multiple chips. That promises a doubling of both capacity and throughout.

Apple's definitely done two brilliant costly things, by putting very wide (but not really fast) memory on package (Intel had dabbled in doing similar with regular width ram in consumer space a while ago with Lakefield). And then by tiling multiple cores together, making it so that if they had four perfect chips next to each other they could ship it as one. Incredibly brilliant maneuver to get fantastic yields, and to scale very big.

sbrother · 2025-07-23T02:49:50 1753238990

You can buy a RTX 6000 Pro Blackwell for $8000-ish which has 96GB VRAM and is much faster than the Apple integrated GPU.

thenaturalist · 2025-07-23T05:52:06 1753249926

In depth comparison of an RTX vs. M3 Pro with 96 GB VRAM: https://www.youtube.com/watch?v=wzPMdp9Qz6Q

kentonv · 2025-07-23T14:24:49 1753280689

It's not faster at running Qwen3-Coder, because Qwen3-Coder does not fit in 96GB, so can't run at all. My goal here is to run Qwen3-Coder (or similarly large models).

Sure you can build a cluster of RTX 6000s but then you start having to buy high-end motherboards and network cards to achieve the bandwidth necessary for it to go fast. Also it's obscenely expensive.

sagarm · 2025-07-23T01:08:42 1753232922

You can get 128GB @ ~500GB/s now for ~$2k: https://a.co/d/bjoreRm

It has 8 channels of DDR5-8000.

ac29 · 2025-07-23T03:17:18 1753240638

AMD says "256-bit LPDDR5x"

It might be technically correct to call it 8 channels of LPDDR5 but 256-bits would only be 4 channels of DDR5.

p_l · 2025-07-23T05:27:54 1753248474

DDR5 uses 32bit channels as well. A DDR5 DIMM holds two channels accessed separately.

sagarm · 2025-08-02T02:36:21 1754102181

Thank you for the correction: I didn't realize each DDR5 transaction was only 32 bits.

kentonv · 2025-07-23T01:20:31 1753233631

Per above, you need 272GB to run Qwen3-Coder (at 4 bit quantization).

Avlin67 · 2025-07-23T02:48:15 1753238895

wrong it is approx half bandwith

gaspoweredcat · 2025-07-31T07:16:55 1753946215

i mean sure its not quite 512gb levels but you can get 128gb on a ryzen AI max chipset which has unified memory like apple, theyre also pretty reasonably priced, i saw an AI max 370 with 96gb on amazon earlier for a shade over £1000, guess you could boost that with an eGPU to gain a bit extra but 64gb would likely be the max you could add so still not quite enough to run full qwen3 coder at a decent quant but not far off, hopefully the next gen will offer more ram or another model comes out that can beat Q3 with fewer params