Would it though? How many people are running inference at home? Outside of enthusiasts I don't know anyone. Even companies don't self-host models and prefer to use APIs. Not that I wouldn't like a consumer GPU with tons of VRAM, but I think that the market for it is quite small for companies to invest building it. If you bother to look at Steam's hardware stats you'll notice that only a small percentage is using high-end cards.
This is the weird part, I saw the same comments in other threads. People keep saying how everyone yearns for local LLMs… but other than hardcore enthusiasts it just sounds like a bad investment? Like it’s a smaller market than gaming GPUs. And by the time anyone runs them locally, you’ll have bigger/better models and GPUs coming out, so you won’t even be able to make use of them. Maybe the whole “indoctrinate users to be a part of Intel ecosystem, so when they go work for big companies they would vouch for it” would have merit… if others weren’t innovating and making their products better (like NVIDIA).
Intel sold their GPUs at negative margin which is part of why the stock fell off a cliff. If they could double the vram they could raise the price into the green even selling thousands, likely closer to 100k, would be far better than what they're doing now. The problem is Intel is run by incompetent people who guard their market segments as tribal fiefs instead of solving for the customer.
that's a dumb management "cart before the horse" problem. I understand a few bugs in the driver but they really should have gotten the driver working decently well before production. Would have even given them more time tweaking the GPU. This is exactly why Intel is failing and will continue to fail with that type of management
Intel management is just brain dead. They could have sold the cards for mining when there was a massive GPU shortage and called it the developer edition but no. It's hard to develop a driver for games when you have no silicon.
I think you're massively underestimating the development cost, of the number of people who would actually purchase a higher vram card at a higher price.
You'd need hundreds of thousands of units to really make much of a difference.
Well, IIUC it's a bit more "having more than 12GB of RAM and raising the price will let it run bigger LLMs on consumer hardware and that'll drive premium-ness / market share / revenue, without subsidizing the price"
I don't know where this idea is coming from, although it's all over these threads.
For context, I write a local LLM inference engine and have 0 idea why this would shift anyone's purchase intent. The models big enough to need more than 12GB VRAM are also slow enough on consumer GPUs that they'd be absurd to run. Like less than 2 tkns/s. And I have 64 GB of M2 Max VRAM and a 24 GB 3090ti.
This makes sense in some ways technologically, but just having a "centralized compute box" seems like a lot more complexity than many/most would want in their homes.
I mean, everything could have been already working that way for a lot of years right? One big shared compute box in your house and everything else is a dumb screen? But few people roll that way, even nerds, so I don't see that becoming a thing for offloaded AI compute.
I also think that the future of consumer AI is going to be models trained/refined on your own data and habits, not just a box in your basement running stock ollama models. So I have some latency/bandwidth/storage/privacy questions when it comes to wirelessly and transparently offloading it to a magic AI box that sits next to my wireless router or w/e, versus running those same tasks on-device. To say nothing of consumer appetite for AI stuff that only works (or only works best) when you're on your home network.
It most likely won't be a separate device. It'll get integrated into something like Apple TV or a HomePod that has an actual function and will be plugged in and networked all the time anyway. The LLM stuff would be just a bonus.
Both are currently used as the hub for HomeKit devices. Making the ATV into a "magic AI Box" won't need anything else except "just" upgrading the CPU from A-series to M-series. Actually the A18 Pro would be enough, it's already used for local inference on the iPhone 16 Pro.
Enthusiast/Prosumer/etc. market is generally still usually highly in most markers even if the revenue is limited. e.g. if hobbyists/students/developers start using Intel GPUs in a few years the enterprise market might become much less averse to buying Intel's datacenter chips.
Would it though? How many people are running inference at home?
I don't know how to quantify it, but it certainly seems like a lot of people are buying consumer nVidia GPUs for compute and the relatively paltry amounts of RAM on those cards seems to be the number one complaint.
So I would say that Intel's potential market is "everybody who is currently buying nVidia GPUs for compute."
nVidia's stingy consumer RAM choices also seem to be a fairly transparent ploy to create a protective moat around their insanely-high-profit-margin datacenter GPUs. So that just seems like kind of an obvious thing for Intel or AMD to consider tackling.
(Although, it has to be said, a lot of commenters have pointed out that it's not as easy as just slapping more RAM chips onto the GPU boards; you need wider data busses as well etc.)
It's a chicken and egg scenario. The main problem with running inference at home is the lack of hardware. If the hardware was there more people would do it. And it's not a problem if "enthusiasts" are the only ones using it because that's to be expected at this stage of the tech cycle. If the market is small just charge more, the enthusiasts will pay it. Once more enthusiasts are running inference at home, then the late adopters will eventually come along.
100% - this could be Intel's ticket to capture the hearts of developers and then everything else that flows downstream. They have nothing to lose here -- just do it Intel!
You can get that on mac mini and it will probably cost you less than equivalent PC setup. Should also perform better than low end Intel GPU and be better supported. Will use less power as well.
My 7800x says not really. Compared to my 3070 it feels so incredibly slow that gets in the way of productivity.
Specifically, waiting ~2 seconds vs ~20 for a code snippet is much more detrimental to my productivity than the time difference would suggest. In ~2 seconds I don't get distracted, in ~20 seconds my mind starts wandering and then I have to spend time refocusing.
Make a GPU that is 50% slower than a 2 generations older mid-range GPU (in tokens/s) but on bigger models and I would gladly shell out 1000+$.
So much so that I am considering getting a 5090 if nVdia actually fixes the connector mess they made with 4090s or even a used v100.
Maybe that's not too bad for someone who wants to use pre-existing models. Their AI Playground examples require at minimum an Intel Core Ultra H CPU, which is quite low-powered compared to even these dedicated GPUs: https://github.com/intel/AI-Playground
I don't know a single person in real life that has any desire to run local LLMs. Even amongst my colleagues and tech friends, not very many use LLMs period. It's still very niche outside AI enthusiasts. GPT is better than anything I can run locally anyway. It's not as popular as you think it is.
I run a 12GB model on my 3060 and use it to help answer healthcare questions. I'm currently doing a medical residency. (No I don't use it to diagnose). It helps comply with any HIPAA style regulations. I sometimes use it to fix up my emails. Not sure why people are longing for a 128GB card, just download a quantized model and run with LM Studio (https://lmstudio.ai/). At least two of my colleagues are using ChatGPT on a regular basis. LLMs are being used in the ER department. LLMs and speech models are being used in psychiatry visits.
This is what I'm fiddling with. My 2080Ti is not quite enough to make it viable. I find the small models fail too often, so need larger Whisper and LLM models.
Like the 4060 Ti would have been a nice fit if it hadn't been for the narrow memory bus, which makes it slower than my 2080 Ti for LLM inference.
A more expensive card has the downside of not being cheap enough to justify idling in my server, and my gaming card is at times busy gaming.
absolutely wrong -- if you're not clever enough to think of any other reason to run an LLM locally then don't condemn the rest of the world to "well they're just using it for porno!"
-.-
I feel like _anyone_ who can pump out GPU's with 24GB+ of memory that are capable to use for py-stuff would benefit greatly.
Even if it's not as performant as the NVIDIA options - just to be able to get the models to run, at whatever speed.
They would fly off the shelves.