M3/M4 Max MacBooks with 128GB RAM are already way better than an A6000 for very ...

jsheard · 2024-12-03T19:38:30 1733254710

The M4 Max needs an enormous 512bit memory bus to extract enough bandwidth out of those LPDDR5x chips, while the GPUs that Intel just launched are 192/160bit and even flagships rarely exceed 384bit. They can't just slap more memory on the board, they would need to dedicate significantly more silicon area to memory IO and drive up the cost of the part, assuming their architecture would even scale that wide without hitting weird bottlenecks.

modeless · 2024-12-03T22:30:47 1733265047

The memory controller would be bigger, and the cost would be higher, but not radically higher. It would be an attractive product for local inference even at triple the current price and the development expense would be 100% justified if it helped Intel get any kind of foothold in the ML market.

hughesjj · 2024-12-03T21:07:39 1733260059

Man, I'm old enough to remember when 512 was a thing for consumer cards back when we had 4-8gb memory

Sure that was only gddr5 and not gddr6 or lpddr5, but I would have bet we'd be up to 512bit again 10 years down the line..

(I mean supposedly hbm3 has done 1024-2048bit busses but that seems more research or super high end cards, not consumer)

jsheard · 2024-12-03T21:09:23 1733260163

Rumor is the 5090 will be bringing back the 512bit bus, for a whopping 1.5TB/sec bandwidth.

AnthonyMouse · 2024-12-04T02:08:23 1733278103

> They can't just slap more memory on the board

Why not? It doesn't have to be balanced. RAM is cheap. You would get an affordable card that can hold a large model and still do inference e.g. 4x faster than a CPU. The 128GB card doesn't have to do inference on a 128GB model as fast as a 16GB card does on a 16GB model, it can be slower than that and still faster than any cost-competitive alternative at that size.

The extra RAM also lets you do things like load a sparse mixture of experts model entirely into the GPU, which will perform well even on lower end GPUs with less bandwidth because you don't have to stream the whole model for each token, but you do need enough RAM for the whole model because you don't know ahead of time which parts you'll need.

elabajaba · 2024-12-04T05:32:53 1733290373

To get 128GB of RAM on a GPU you'd need at least a 1024 bit bus. GDDR6x is 16Gbit 32 pins, so you'd need 64 GDDR6x chips, which good luck even trying to fit that around the GPU die since traces need to be the same length, and you want to keep them as short as possible. There's also a good chance you can't run a clamshell setup so you'd have to double the bus width to 2048 because 32 GDDR6x chips would kick off way too much heat to be cooled on the back of a GPU. Such a ridiculous setup would obviously be extremely expensive and would use way too much power.

A more sensible alternative would be going with HBM, except good luck getting any capacity for that since it's all being used for the extremely high margin data center GPUs. HBM is also extremely expensive both in terms of the cost of buying the chips and due to it's advanced packaging requirements.

ryao · 2024-12-04T07:14:07 1733296447

You do not need a 1024-bit bus to put 128GB of some DDR variant on a GPU. You could do a 512-bit bus with dual rank memory. The 3090 had a 384-bit bus with dual rank memory and going to 512-bit from that is not much of a leap.

This assumes you use 32Gbit chips, which will likely be available in the near future. Interestingly, the GDDR7 specification allows for 64Gbit chips:

> the GDDR7 standard officially adds support for 64Gbit DRAM devices, twice the 32Gbit max capacity of GDDR6/GDDR6X

https://www.anandtech.com/show/21287/jedec-publishes-gddr7-s...

AnthonyMouse · 2024-12-04T16:29:21 1733329761

Yeah, the idea that you're limited by bus width is kind of silly. If you're using ordinary DDR5 then consider that desktops can handle 192GB of memory with a 128-bit memory bus, implying that you get 576GB with a 384-bit bus and 768GB at 512-bit. That's before you even consider using registered memory, which is "more expensive" but not that much more expensive.

And if you want to have some real fun, cause "registered GDDR" to be a thing.

Zandikar · 2024-12-04T02:00:40 1733277640

> They can't just slap more memory on the board, they would need to dedicate significantly more silicon area to memory IO and drive up the cost of the part,

In the pedantic sense of just literally slapping more on existing boards? No, they might have one empty spot for an extra BGA VRAM chip, but not enough for the gain's we're talking about. But this is absolutely possible, trivially so for someone like Intel/AMD/NVidia, that has full control over the architectural and design process. Is it a switch they flip at the factory 3 days before shipping? No, obviously not. But if they intended this to be the case ~2 years ago when this was just a product on the drawing board? Absolutely. There is 0 technical/hardware/manufacturing reason they couldn't do this. And considering the "entry level" competitor product is the M4 Max which starts at at least $3,000 (for a 128GB equipped one), the margin on pricing more than exists to cover a few hundred extra in ram and extra overhead in higher-layer more populated PCB's.

The real impediment is what you landed on at the end there combined with the greater ecosystem not having support for it. Intel could drop a card that is, by all rights, far better performing hardware than a competing Nvidia GPU, but Nvidia's dominance in API's, CUDA, Networking, Fabric-switches (NVLink, mellanox, bluefield), etc etc for that past 10+ years and all of the skilled labor that is familiar with it would largely render a 128GB Arc GPU a dud on delivery, even if it was priced as a steal. Same thing happened with the Radeon VII. Killer compute card that no one used because while the card itself was phenomenal, the rest of the ecosystem just wasn't there.

Now, if intel committed to that card, and poured their considerable resources into that ecosystem, and continued to iterate on that card/family, then now we're talking, but yeah, you can't just 10X VRAM on a card that's currently a non-player in the GPGPU market and expect anyone in the industry to really give a damn. Raise an eyebrow or make a note to check back in a year? Sure. But raise the issue to get a greenlight on the corpo credit line? Fat chance.

ryao · 2024-12-04T07:42:47 1733298167

A 128GB VRAM Intel Arc card at a low price would be an OSS developer’s dream come true. It would be the WRT54G of inference.

Zandikar · 2024-12-04T18:55:11 1733338511

Of course. A cheap card with oodles of VRAM would benefit some people, I'm not denying that. I'm tackling the question of would it benefit intel (as the original question was "why doesn't intel do this"), and the answer is: Profit/Loss.

There's a huge number of people in that community that would love to have such a card. How many are actually willing and able to pony up >=$3k per unit? How many units would they buy? Given all of the other considerations that go into making such cards useful and easy to use (as described), the answer is - in intel's mind - nowhere near enough, especially when the financial side of the company's jimmies are so rustled that they sacked Pat G without a proper replacement and nominated some finance bros in as interim CEO's. Intel is ALREADY taking a big risk and financial burden trying to get into this space in the first place, and they're already struggling, so the prospect of betting the house like that just isn't going to fly for the finance bros that can't see passed the next 2 quarters.

To be clear, I personally think there is huge potential value in trying to support the OSS community to, in essence, "crowd source" and speedrun some of that ecosystem by supplying (Compared to the competition) "cheap" cards that aschew the artificial segmentation everyone else is doing and investing in that community. But I'm not running Intel, so while that'd be nice, it's not really relevant.

ryao · 2024-12-05T01:33:23 1733362403

I suspect that Intel could hit a $2000 price point for a 128GB ARC card, but even at $3000, it would certainly be cheaper than buying 8 A770 LE ARC cards and connecting them to a machine. There are likely tens of thousands of people buying multiple GPUs to run local inferencing on Reddit’s local llama, and that is a subset of the market.

In Q1 2023, Intel sold 250,000 ARC cards. Sales then collapsed the next quarter. I would expect sales to easily exceed that and be maintained. The demand for high memory GPUs is far higher than many realize. You have professional inferencing operations such as the ones listed at openrouter.ai that would gobble up 128GB VRAM ARC cards for running smaller high context models, much like how you have businesses gobbling up the Raspberry Pi for low end tasks, without even considering the local inference community.

CoastalCoder · 2024-12-03T22:01:24 1733263284

> The M4 Max needs an enormous 512bit memory bus to extract enough bandwidth out of those LPDDR5x chips

Does M4 Max have 64-byte cache lines?

If they can fetch or flush an entire cache line in a single memory-bus transaction, I wonder if that opens up any additional hardware / performance optimizations.

my123 · 2024-12-04T06:13:22 1733292802

> Does M4 Max have 64-byte cache lines?

on the CPU side: 64 bytes at L1, 128 byte cachelines at L2

rowanG077 · 2024-12-04T00:14:00 1733271240

A single memory transaction is almost always a 16n burst for LPDDR5x.

p1esk · 2024-12-03T20:39:47 1733258387

Apple could do it. Why can’t Intel?

jsheard · 2024-12-03T20:50:44 1733259044

Because Apple isn't playing the same game as everyone else. They have the money and clout to buy out TSMCs bleeding-edge processes and leave everyone else with the scraps, and their silicon is only sold in machines with extremely fat margins that can easily absorb the BOM cost of making huge chips on the most expensive processes money can buy.

p1esk · 2024-12-03T20:59:00 1733259540

Bleeding edge processes is what Intel specializes in. Unlike Apple, they don’t need TSMC. This should have been a huge advantage for Intel. Maybe that’s why Gelsinger got the boot.

duskwuff · 2024-12-03T21:08:08 1733260088

> Bleeding edge processes is what Intel specializes in. Unlike Apple, they don’t need TSMC.

Intel literally outsourced their Arrow Lake manufacturing to TSMC because they couldn't fabricate the parts themselves - their 20A (2nm) process node never reached a production-ready state, and was eventually cancelled about a month ago.

p1esk · 2024-12-03T22:01:59 1733263319

OK, so the question becomes: TSMC could do it. Why can’t Intel?

ac29 · 2024-12-04T00:07:37 1733270857

Intel is maybe a year or two behind TSMC right now. They might or might not catch up since it is a moving target, but I dont think there is anything TSMC is doing today that Intel wont be doing in the near future.

BonoboIO · 2024-12-03T22:05:33 1733263533

They are trying … for like 10 years

ubercore · 2024-12-04T00:09:19 1733270959

Wasn't it cancelled in favor of 18A?

duskwuff · 2024-12-04T03:52:50 1733284370

It was, but that only puts them further away from shipping product.

wtallis · 2024-12-03T21:08:52 1733260132

These days, Intel merely specializes in bleeding processes. They spent far too many years believing the unrealistic promises from their fab division, and in the past few years they've been suffering the consequences as the problems are too big to be covered up by the cost savings of vertical integration.

jsheard · 2024-12-03T21:06:40 1733260000

Intel's foundry side has been floundering so hard that they've resorted to using TSMC themselves in an attempt to keep up with AMD. Their recently launched CPUs are a mix of Intel-made and TSMC-made chiplets, but the latter accounts for most of the die area.

anon373839 · 2024-12-04T02:02:27 1733277747

I'm not certain this is quite as damning as it sounds. My understanding is that the foundry business was intentionally walled off from the product business, and that the latter wasn't going to be treated as a privileged customer.

barkingcat · 2024-12-04T02:42:29 1733280149

no, in fact, it sounds even more damning because client side was able to pick whatever was best on the market, and it wasn't intel. Client side could to learn and customize their designs to use another company's processes (this is an extremely hard thing to do by the way) faster than intel foundry could even get their pants up in the morning.

Intel foundry screwed up so badly that Nokia's server division was almost shut down because of Intel Foundry's failure. (imagine being so bad at your job, that your clients go out of business) If Intel client side chose to use Foundry, there just wouldn't be any chips to sell.

AlotOfReading · 2024-12-03T21:05:47 1733259947

Intel Arc hardware is manufactured by TSMC, specifically on N6 and N5 for this latest announcement.

Intel doesn't currently have nodes competitive with TSMC or excess capacity in their better processes.

knowitnone · 2024-12-04T00:18:49 1733271529

Serious question, why don't they have excess capacity? They aren't producing many CPUs...that people want.

vel0city · 2024-12-04T01:29:42 1733275782

Hard to have excess capacity when your latest fabs aren't able to produce anything reliably.

They don't even have competitive capacity for all their CPU needs. They have negative spare capacity overall.

AlotOfReading · 2024-12-04T00:59:24 1733273964

Because their production capacity is much smaller than TSMC, it's declined over the past few years, and their newer nodes are having yield issues.

barkingcat · 2024-12-04T02:40:24 1733280024

if you think intel has bleeding edge processes, that hasn't been the case for over 6 years ...

JBiserkov · 2024-12-03T22:04:22 1733263462

> and their silicon is only sold in machines with extremely fat margins

Like the brand new Mini that cost 600 USD and went to 500 during Black week.

michaelmrose · 2024-12-03T23:56:42 1733270202

The 600 mini is on the m4 with the anemic 10 core GPU. This is for grandama or your 10 year old.

The good one which is still slower than m4 max is 2200.

If you want the max you need at least a macbook pro starting at 3200 and if you want the better one with 128G RAM it starts at about 5k

barkingcat · 2024-12-04T02:50:38 1733280638

No matter how few GPU cores the M4 has, it is still an extremely potent product on the whole.

Better than most of the pc's out there.

michaelmrose · 2024-12-04T03:23:04 1733282584

Do you mean subjectively because you find Mac pleasant to use? The $600 mini certainly isn't more performant than 85% of desktops.

my123 · 2024-12-04T06:16:23 1733292983

> The $600 mini certainly isn't more performant than 85% of desktops.

The average desktop isn't exactly a gaming machine. But a corporate box or a low end home desktop with a Core i5 and using the iGPU.

ryao · 2024-12-04T06:35:28 1733294128

Transistor IO logic scaling died a while ago, which is what prompted AMD to go with a chiplet architecture. Being on a more advanced process does not make implementing an 512-bit memory bus any easier for Apple. If anything, it makes it more expensive for Apple than it would be for Intel.

dragontamer · 2024-12-03T21:10:12 1733260212

Because LPDDR5x is soldered on RAM.

Everyone else wants configurable RAM that scales both down (to 16GB) and up (to 2TB), to cover smaller laptops and bigger servers.

GPUs with soldered on RAM has 500GB/sec bandwidths, far in excess of Apples chips. So the 8GB or 16GB offered by NVidia or AMD is just far superior at vid o game graphics (where textures are the priority)

jsheard · 2024-12-03T21:14:11 1733260451

> GPUs with soldered on RAM has 500GB/sec bandwidths, far in excess of Apples chips.

Apple is doing 800GB/sec on the M2 Ultra and should reach about 1TB/sec with the M4 Ultra, but that's still lagging behind GPUs. The 4090 was already at the 1TB/sec mark two years ago, the 5090 is supposedly aiming for 1.5TB/sec, and the H200 is doing 5TB/sec.

dragontamer · 2024-12-03T21:35:41 1733261741

HBM is kind of not fair lol. But 4096-line bus is gonna have more bandwidth than any competitor.

It's pretty expensive though.

The 500GB/sec number is for a more ordinary GPU like the B580 Battlemage in the $250ish price range. Obviously the $2000ish 4090 will be better, but I don't expect the typical consumer to be using those.

kimixa · 2024-12-03T22:00:47 1733263247

But an on-package memory bus has some of the advantages of HBM, just to a lesser extent, so it's arguably comparable as an "intermediate stage" between RAM chips and HBM. Distances are shorter (so voltage drop and capacitance are lower, so can be driven at lower power), routing is more complex but can be worked around by more layers, which increases cost but on a significantly smaller area than required for dimms, and the dimms connections themselves can hurt performance (reflection from poor contacts, optional termination makes things more complex, and the expectations of mix-and-match for dimm vendors and products likely reduce fine tuning possibilities).

There's pretty much a direct opposite scaling between flexibility and performance - dimms > soldered ram > on-package ram > die-interconnects.

AnthonyMouse · 2024-12-04T01:33:59 1733276039

The question is why Intel GPUs, which already have soldered memory, aren't sold with more of it. The market here isn't something that can beat enterprise GPUs at training, it's something that can beat desktop CPUs at inference with enough VRAM to fit large models at an affordable price.

imtringued · 2024-12-04T07:19:46 1733296786

Intel also has lunar lake CPUs with on package RAM. They could have added more memory channels like Apple did.

Der_Einzige · 2024-12-03T20:43:07 1733258587

It doesn't matter if the "cost is driven up". Nvidia has proven that we're all lil pay pigs for them. 5090 will be 3000$ for 32gb of VRAM. Screenshot this now, it will age well.

We'd be happy to pay 5000 for 128gb from Intel.

pixelpoet · 2024-12-03T21:01:30 1733259690

You are absolutely correct, and even my non-prophetic ass echoed exactly the first sentence of the top comment in this HN thread ("Why don't they just release a basic GPU with 128GB RAM and eat NVidia's local generative AI lunch?").

Yes, yes, it's not trivial to have a GPU with 128gb of memory with cache tags and so on, but is that really in the same universe of complexity of taking on Nvidia and their CUDA / AI moat any other way? Did Intel ever give the impression they don't know how to design a cache? There really has to be a GOOD reason for this, otherwise everyone involved with this launch is just plain stupid or getting paid off to not pursue this.

Saying all this with infinite love and 100% commercial support of OpenCL since version 1.0, a great enjoyer of A770 with 16GB of memory, I live to laugh in the face of people who claimed for over 10 years that OpenCL is deprecated on MacOS (which I cannot stand and will never use, yet the hardware it runs on...) and still routinely crushes powerful desktop GPUs, in reality and practice today.

timschmidt · 2024-12-03T21:19:34 1733260774

Both Intel and AMD produce server chips with 12 channel memory these days (that's 12x64bit for 768bit) which combined with DDR5 can push effective socket bandwidth beyond 800GB/s, which is well into the area occupied by single GPUs these days.

You can even find some attractive deals on motherboard/ram/cpu bundles built around grey market engineering sample CPUs on aliexpress with good reports about usability under Linux.

Building a whole new system like this is not exactly as simple as just plugging a GPU into an existing system, but you also benefit from upgradeability of the memory, and not having to use anything like CUDA. llamafile, as an example, really benefits from AVX-512 available in recent CPUs. LLMs are memory bandwidth bound, so it doesn't take many CPU cores to keep the memory bus full.

Another benefit is that you can get a large amount of usable high bandwidth memory with a relatively low total system power usage. Some of AMD's parts with 12 channel memory can fit in a 200W system power budget. Less than a single high end GPU.

pixelpoet · 2024-12-03T21:23:28 1733261008

My desktop machine has had 128gb since 2018, but for the AI workloads currently commanding almost infinite market value, it really needs the 1TB/s bandwidth and teraflops that only a bona fide GPU can provide. An early AMD GPU with these characteristics is the Radeon VII with 16gb HBM, which I bought for 500 eur back in 2019 (!!!).

I'm a rendering guy, not an AI guy, so I really just want the teraflops, but all GPU users urgently need a 3rd market player.

timschmidt · 2024-12-03T21:25:40 1733261140

That 128gb is hanging off a dual channel memory bus with only 128 total bits of bandwidth. Which is why you need the GPU. The Epyc and Xeon CPUs I'm discussing have 6x the memory bandwidth, and will trade blows with that GPU.

pixelpoet · 2024-12-03T21:29:45 1733261385

At a mere 20x the cost or something, to say nothing about the motherboard etc :( 500 eur for 16GB of 1TB/s with tons of fp32 (and even fp64! The main reason I bought it) back in 2019 is no joke.

Believe me, as a lifelong hobbyist-HPC kind of person, I am absolutely dying for such a HBM/fp64 deal again.

timschmidt · 2024-12-03T21:37:11 1733261831

$1,961.19: H13SSL-N Motherboard And EPYC 9334 QS CPU + DDR5 4*128GB 2666MHZ REG ECC RAM Server motherboard kit

https://www.aliexpress.us/item/3256807766813460.html

Doesn't seem like 20x to me. I'm sure spending more than 30 seconds searching could find even better deals.

pixelpoet · 2024-12-03T21:40:23 1733262023

Isn't 2666 MHz ECC RAM obscenely slow? 32 cores without the fast AVX-512 of Zen5 isn't what anyone is looking for in terms of floating point throughput (ask me about electricity prices in Germany), and for that money I'd rather just take a 4090 with 24GB memory and do my own software fixed point or floating point (which is exactly what I do personally and professionally).

This is exactly what I meant about Intel's recent launch. Imagine if they went full ALU-heavy on latest TSMC process and packaged 128GB with it, for like, 2-3k Eur. Nvidia would be whipping their lawyers to try to do something about that, not just their engineers.

ryao · 2024-12-04T05:15:57 1733289357

Yes and no. I have been developing some local llama 3 inference software on a machine with 3200MT/s ECC RAM and a Ryzen 7 5800X:

https://github.com/ryao/llama3.c

My experience is that input processing (prompt processing) is compute bottlenecked in GEMM. AVX-512 would help there, although my CPU’s Zen 3 cores do not support it and the memory bandwidth does not matter very much. For output generation (token generation), memory bandwidth is a bottleneck and AVX-512 would not help at all.

timschmidt · 2024-12-03T21:43:54 1733262234

I don't think anyone's stopping you, buddy. Great chat. I hope you have a nice evening.

ryao · 2024-12-04T09:13:55 1733303635

12 channel DDR5 is actually 12x32-bit. JEDEC in its wisdom decided to split the 64-bit channels of earlier versions of DDR into 2x 32-bit channels per DIMM. Reaching 768-bit memory buses with DDR5 requires 24 channels.

Whenever I see DDR5 memory channels discussed, I am never sure if the speaker is accounting for the 2x 32-bit channels per DIMM or not.

jsheard · 2024-12-03T21:02:46 1733259766

The question is whether there's enough overall demand for a GPU architecture with 4x the VRAM of a 5090 but only about 1/3rd of the bandwidth. At that point it would only really be good for AI inferencing, so why not make specialized inferencing silicon instead?

mandelken · 2024-12-03T22:38:51 1733265531

I genuinely wonder why no one is doing this? Why can't I buy this specialized AI inference silicon with plenty of VRAM?

ryao · 2024-12-04T05:24:21 1733289861

Intel and Qualcomm are doing this, although Intel uses HBM and their hardware is designed to do both inference and training while Qualcomm uses more conventional memory and their hardware is only designed do inference:

https://www.intel.com/content/www/us/en/products/details/pro...

https://www.qualcomm.com/news/onq/2023/11/introducing-qualco...

They did not put it into the PC parts supply chain for reasons known only to them. That said, it would be awesome if Intel made high memory variants of their Arc graphics cards for sale through the PC parts supply chains.

ahakki · 2024-12-03T23:31:55 1733268715

I guess that would be an NPU combined with LPDDR. Basically any Windows Copilot Plus approved device.

mirekrusin · 2024-12-03T21:02:10 1733259730

Me too, probably 2x. I’d sell like hot cakes.

ryao · 2024-12-04T05:31:39 1733290299

If their memory IO supports multiple ranks like the RTX 3090 (it used dual rank) did, they could do a new PCB layout and then add more memory chips to it. No additional silicon area would be necessary.

wtallis · 2024-12-03T19:42:46 1733254966

That would basically mean Intel doubling the size of their current GPU die, with a different memory PHY. They're clearly not ready to make that an affordable card. Maybe when they get around to making a chiplet-based GPU.

kevingadd · 2024-12-03T19:37:17 1733254637

Are you suggesting that Intel 'just' release a GPU at the same price point as an M4 Max SOC? And that there would be a large market for it if they did so? Seems like an extremely niche product that would be demanding to manufacture. The M4 Max makes sense because it's a complete system they can sell to Apple's price-insensitive audience, Intel doesn't have a captive market like that to sell bespoke LLM accelerator cards to yet.

If this hypothetical 128GB LLM accelerator was also a capable GPU that would be more interesting but Intel hasn't proven an ability to execute on that level yet.

treprinum · 2024-12-03T19:42:20 1733254940

Nothing in my comment says about pricing it at the M4 Max level. Apple charges as much because they can (typing this on an $8000 M3 Max). 128GB LPDDR5 is dirt cheap these days just Apple adds its premium because they like to. Nothing prevents Intel from releasing a basic GPU with that much RAM for under $1k.

wtallis · 2024-12-03T19:52:58 1733255578

You're asking for a GPU die at least as large as NVIDIA's TU102 that was $1k in 2018 when paired with only 11GB of RAM (because $1k couldn't get you a fully-enabled die to use 12GB of RAM). I think you're off by at least a factor of two in your cost estimates.

ryao · 2024-12-04T02:47:24 1733280444

If Intel came out with an ARC GPU with 128GB VRAM at a $2000 price point, I and many others would likely buy it immediately.

FeepingCreature · 2024-12-04T09:04:51 1733303091

Though Intel should also identify say the top-100 finetuners and just send it to them for free, on the down low. That would create some market pressure.

LargoLasskhyfv · 2024-12-05T10:29:48 1733394588

HBM plz.

treprinum · 2024-12-03T20:02:08 1733256128

Intel has Xeon Phi which was a spin-off of their first attempt at GPU so they have a lot of tech in place they can reuse already. They don't need to go with GDDRx/HBMx designs that require large dies.

ksec · 2024-12-03T20:57:56 1733259476

I don't want to further this discussions but may be you dont realise some of the people who replied to you either design hardware for a living or has been in the hardware industry for longer than 20 years.

ryao · 2024-12-04T04:38:17 1733287097

While it is not a GPU, Qualcomm already made an inferencing card with 128GB RAM:

https://www.qualcomm.com/news/onq/2023/11/introducing-qualco...

It would be interesting if those saying that a regular GPU with 128GB of VRAM cannot be made would explain how Qualcomm was able to make this card. It is not a big stretch to imagine a GPU with the same memory configuration. Note that Qualcomm did not use HBM for this.

treprinum · 2024-12-03T21:17:52 1733260672

For some reason Apple did it with M3/M4 Max likely by folks that are also on HN. The question is how many of the years spent designing HW were spent also by educating oneselves on the latest best ways to do it.

ksec · 2024-12-03T21:23:55 1733261035

>For some reason.....

They already replied with an answer.

wtallis · 2024-12-03T21:12:29 1733260349

Even LPDDR requires a large die. It only takes things out of the realm of technologically impossible to merely economically impractical. A 512-bit bus is still very inconveniently large for a single die.

anon373839 · 2024-12-04T02:03:39 1733277819

> release a GPU at the same price point as an M4 Max SOC

Why would it need to be introduced at Apple's high-margin pricing?

m00x · 2024-12-03T19:43:39 1733255019

It's also impossible and it would need to be a CPU.

CPUs and GPUs access memory very differently.