Honestly, I don't think we would need "this type of RAM." The confused part of this discussion is the belief that we need obscene bandwidth.
If I need 300GB/s memory bandwidth for my workload, that can be accomplished with:
* One RAM chip with 300GB/s
* Two RAM chips with 150GB/s each
* Four RAM chips with 75GB/s each
Etc.
Stepping up from 16GB to 196GB, the bandwidth requirements for each chip go down 10-fold, and you can use much cheaper RAM as a result. And all the signalling requirements relax too.
Much of this discussion presumes a 200GB card would individually need the same capacity to each RAM chip as a 12GB card. This is just false. An A770 or 4060-grade card couldn't keep up with that much data. And if I'm using a small model, I can get the same bandwidth by properly distributing it among RAM chips (which most hardware does automatically).
An A770 or 4060-grade card, with the same total memory capacity as we have today, but 200GB RAM, would allow us to run high-quality LLMs locally or do high-resolution renders. That wouldn't have the same performance as a $200k card, obviously, but for many inferences uses, that's just not very important.
If I were buying for my own uses, I'd want 12x 32GB PC3200 DIMMs for a total of 384GB RAM at $600 for the RAM (say $2k total), with an individual throughput of 25GB/sec and a total throughput of 300GB/sec. I'd be okay with 4060-grade performance. My own uses are a bit niche, and I think for most other people's uses, something with a little more throughput and a little less capacity (48-196GB) might make more sense. But you definitely don't need the same throughput as existing GPU RAM.