Launching a new SKU for $500-1000 with 48gb of RAM seems like a profitable idea. The GPU isn't top-of-the-line, but the RAM would be unmatched for running a lot of models locally.
It's not technically possible to just slap on more RAM. GDDR6 is point-to-point with option for clamshell, and the largest chips in mass production are 16Gbit/32 bit. So, for a 192bit card, the best you can get is 192/32×16Gbit×2 = 24GB.
To have more memory, you have to design a new die with a wider interface. The design+test+masks on leading edge silicon is tens of millions of NRE, and has to be paid well over a year before product launch. No-one is going to do that for a low-priced product with an unknown market.
The savior of home inference is probably going to be AMD's Strix Halo. It's a laptop APU built to be a fairly low end gaming chip, but it has a 256-bit LPDDR5X interface. There are larger LPDDR5X packages available (thanks to the smartphone market), and Strix Halo should be eventually available with 128GB of unified ram, performance probably somewhere around a 4060.
You can’t just throw in more RAM without having the rest of the GPU architected for it. So there’s an R&D cost involved for such a design, and there may even be trade-offs on performance for the mass-market lower-tier models. I’m doubtful that the LLM enthusiast/tinkerer market is large enough for that to be obviously profitable.
That would depend on how they designed the memory controllers. GDDR6 only supporting 1-2gb modules at present (I believe GDDR6W supports 4gb modules). If they were using 12 1gb modules, then increasing to 24gb shouldn't be a very large change.
Honestly, Apple seems to be on the right track here. DDR5 is slower than GDDR6, but you can scale the amount of RAM far higher simply by swapping out the density.
give me 48gb with reasonable power consumption so I can dev locally and I will buy it in a heartbeat. Anyone that is fine-tuning would want a setup like that to test things before pushing to real GPUs. And in reality if you can fine-tune on a card like that in two days instead of a few hours it would totally be worth it.
The bigger point here is to ask why they aren't designing that in from the start. Same with AMD. RAM has been stalled and is critical. Start focusing on allowing a lot more of it, even at the cost of performance, and you have a real product. I have a 12GB 3060 as my dev box and the big limiter for it is RAM, not cuda cores. If it had 48GB but the same number of cores then I would be very happy with it, especially if it was power efficient.
Because designing a low end GPU with a very wide memory interface isn't useful for gaming, and that is where the vast majority of non-datacenter discrete GPU sales are right now.