Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Ugh, why is Apple the only one shipping consumer GPUs with tons of RAM?

I would totally buy a device like this for $10k if it were designed to run Linux.



Intel already has a great value GPU. Everyone wants them to disrupt the game, destroy the product niches. It's general purpose compute performance is quite ass alas but maybe that doesn't matter for AI?

I'm not sure if there are higher capacity gddr6 & 7's rams to buy. I semi doubt you can add more without more channels, to some degree, but also, AMD just shipped R9700 based on rx9070 but with double the ram. But something like Strix Halo, an API with more lpddr channels could work. Word is that Strix Halo's 2027 successor Medusa Halo will go to 6 channels and it's hard to see a significant advantage without that win; the processing is already throughput constrained-ish and a leap on memory bandwidth will definitely be required. Dual channel 128b isn't enough!

There's also MRDIMMs standard, which multiplexes multiple chips. That promises a doubling of both capacity and throughout.

Apple's definitely done two brilliant costly things, by putting very wide (but not really fast) memory on package (Intel had dabbled in doing similar with regular width ram in consumer space a while ago with Lakefield). And then by tiling multiple cores together, making it so that if they had four perfect chips next to each other they could ship it as one. Incredibly brilliant maneuver to get fantastic yields, and to scale very big.


You can buy a RTX 6000 Pro Blackwell for $8000-ish which has 96GB VRAM and is much faster than the Apple integrated GPU.


In depth comparison of an RTX vs. M3 Pro with 96 GB VRAM: https://www.youtube.com/watch?v=wzPMdp9Qz6Q


It's not faster at running Qwen3-Coder, because Qwen3-Coder does not fit in 96GB, so can't run at all. My goal here is to run Qwen3-Coder (or similarly large models).

Sure you can build a cluster of RTX 6000s but then you start having to buy high-end motherboards and network cards to achieve the bandwidth necessary for it to go fast. Also it's obscenely expensive.


You can get 128GB @ ~500GB/s now for ~$2k: https://a.co/d/bjoreRm

It has 8 channels of DDR5-8000.


AMD says "256-bit LPDDR5x"

It might be technically correct to call it 8 channels of LPDDR5 but 256-bits would only be 4 channels of DDR5.


DDR5 uses 32bit channels as well. A DDR5 DIMM holds two channels accessed separately.


Thank you for the correction: I didn't realize each DDR5 transaction was only 32 bits.


Per above, you need 272GB to run Qwen3-Coder (at 4 bit quantization).


wrong it is approx half bandwith


i mean sure its not quite 512gb levels but you can get 128gb on a ryzen AI max chipset which has unified memory like apple, theyre also pretty reasonably priced, i saw an AI max 370 with 96gb on amazon earlier for a shade over £1000, guess you could boost that with an eGPU to gain a bit extra but 64gb would likely be the max you could add so still not quite enough to run full qwen3 coder at a decent quant but not far off, hopefully the next gen will offer more ram or another model comes out that can beat Q3 with fewer params




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: