Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

12GB memory

-.-

I feel like _anyone_ who can pump out GPU's with 24GB+ of memory that are capable to use for py-stuff would benefit greatly.

Even if it's not as performant as the NVIDIA options - just to be able to get the models to run, at whatever speed.

They would fly off the shelves.




Would it though? How many people are running inference at home? Outside of enthusiasts I don't know anyone. Even companies don't self-host models and prefer to use APIs. Not that I wouldn't like a consumer GPU with tons of VRAM, but I think that the market for it is quite small for companies to invest building it. If you bother to look at Steam's hardware stats you'll notice that only a small percentage is using high-end cards.


This is the weird part, I saw the same comments in other threads. People keep saying how everyone yearns for local LLMs… but other than hardcore enthusiasts it just sounds like a bad investment? Like it’s a smaller market than gaming GPUs. And by the time anyone runs them locally, you’ll have bigger/better models and GPUs coming out, so you won’t even be able to make use of them. Maybe the whole “indoctrinate users to be a part of Intel ecosystem, so when they go work for big companies they would vouch for it” would have merit… if others weren’t innovating and making their products better (like NVIDIA).


Intel sold their GPUs at negative margin which is part of why the stock fell off a cliff. If they could double the vram they could raise the price into the green even selling thousands, likely closer to 100k, would be far better than what they're doing now. The problem is Intel is run by incompetent people who guard their market segments as tribal fiefs instead of solving for the customer.


> which is part of why the stock fell off a cliff

Was it? Their GPUs sales were insignificantly low so I doubt that had a huge effect on their net income.


They spent billions at TSMC making Alchemist dies that sat in a warehouse for a year or two as they tried to fix the drivers.


that's a dumb management "cart before the horse" problem. I understand a few bugs in the driver but they really should have gotten the driver working decently well before production. Would have even given them more time tweaking the GPU. This is exactly why Intel is failing and will continue to fail with that type of management


Intel management is just brain dead. They could have sold the cards for mining when there was a massive GPU shortage and called it the developer edition but no. It's hard to develop a driver for games when you have no silicon.


By subsidizing it more they'll lose less money?


Increasing VRAM would differentiate intel GPUs and allow driving higher ASPs, into the green.


I think you're massively underestimating the development cost, of the number of people who would actually purchase a higher vram card at a higher price.

You'd need hundreds of thousands of units to really make much of a difference.


Well, IIUC it's a bit more "having more than 12GB of RAM and raising the price will let it run bigger LLMs on consumer hardware and that'll drive premium-ness / market share / revenue, without subsidizing the price"

I don't know where this idea is coming from, although it's all over these threads.

For context, I write a local LLM inference engine and have 0 idea why this would shift anyone's purchase intent. The models big enough to need more than 12GB VRAM are also slow enough on consumer GPUs that they'd be absurd to run. Like less than 2 tkns/s. And I have 64 GB of M2 Max VRAM and a 24 GB 3090ti.


The future of local LLMs is not people running it on their PCs.

It's going to be a HomePod/AppleTV/Echo/Google Home -style box you set up in a corner and forget about it.

Then your devices in the ecosystem can offload some LLM tasks to that local system for inference, without having to do everything on-device.


This makes sense in some ways technologically, but just having a "centralized compute box" seems like a lot more complexity than many/most would want in their homes.

I mean, everything could have been already working that way for a lot of years right? One big shared compute box in your house and everything else is a dumb screen? But few people roll that way, even nerds, so I don't see that becoming a thing for offloaded AI compute.

I also think that the future of consumer AI is going to be models trained/refined on your own data and habits, not just a box in your basement running stock ollama models. So I have some latency/bandwidth/storage/privacy questions when it comes to wirelessly and transparently offloading it to a magic AI box that sits next to my wireless router or w/e, versus running those same tasks on-device. To say nothing of consumer appetite for AI stuff that only works (or only works best) when you're on your home network.


It most likely won't be a separate device. It'll get integrated into something like Apple TV or a HomePod that has an actual function and will be plugged in and networked all the time anyway. The LLM stuff would be just a bonus.

Both are currently used as the hub for HomeKit devices. Making the ATV into a "magic AI Box" won't need anything else except "just" upgrading the CPU from A-series to M-series. Actually the A18 Pro would be enough, it's already used for local inference on the iPhone 16 Pro.


Enthusiast/Prosumer/etc. market is generally still usually highly in most markers even if the revenue is limited. e.g. if hobbyists/students/developers start using Intel GPUs in a few years the enterprise market might become much less averse to buying Intel's datacenter chips.


    Would it though? How many people are running inference at home? 
I don't know how to quantify it, but it certainly seems like a lot of people are buying consumer nVidia GPUs for compute and the relatively paltry amounts of RAM on those cards seems to be the number one complaint.

So I would say that Intel's potential market is "everybody who is currently buying nVidia GPUs for compute."

nVidia's stingy consumer RAM choices also seem to be a fairly transparent ploy to create a protective moat around their insanely-high-profit-margin datacenter GPUs. So that just seems like kind of an obvious thing for Intel or AMD to consider tackling.

(Although, it has to be said, a lot of commenters have pointed out that it's not as easy as just slapping more RAM chips onto the GPU boards; you need wider data busses as well etc.)


It's a chicken and egg scenario. The main problem with running inference at home is the lack of hardware. If the hardware was there more people would do it. And it's not a problem if "enthusiasts" are the only ones using it because that's to be expected at this stage of the tech cycle. If the market is small just charge more, the enthusiasts will pay it. Once more enthusiasts are running inference at home, then the late adopters will eventually come along.


Mac minis are great for this. They're cheap-ish and they can run quite large models at a decent speed if you run it with an MLX backend.


mini _Pro_ are great for this, ones with large RAM upgrades.

If you get the base 16GB mini, it will have more or less the same VRAM but way worse performance than an Arc.

If you already have a PC, it makes sense to go for the cheapest 12GB card instead of a base mac mini.


100% - this could be Intel's ticket to capture the hearts of developers and then everything else that flows downstream. They have nothing to lose here -- just do it Intel!


They could lose a lot of money?


They already do... google $INTC, stare in disbelief in the right side "Financials".

At some point they should make a stand, that's the whole meta-topic of this thread.


Sorry, you're right, they could lose a lot more money.


You can get that on mac mini and it will probably cost you less than equivalent PC setup. Should also perform better than low end Intel GPU and be better supported. Will use less power as well.


You can just use a CPU in that case, no? You can run most ML inference on vectorized operations on modern CPUs at a fraction of the price.


My 7800x says not really. Compared to my 3070 it feels so incredibly slow that gets in the way of productivity.

Specifically, waiting ~2 seconds vs ~20 for a code snippet is much more detrimental to my productivity than the time difference would suggest. In ~2 seconds I don't get distracted, in ~20 seconds my mind starts wandering and then I have to spend time refocusing.

Make a GPU that is 50% slower than a 2 generations older mid-range GPU (in tokens/s) but on bigger models and I would gladly shell out 1000+$.

So much so that I am considering getting a 5090 if nVdia actually fixes the connector mess they made with 4090s or even a used v100.


I'm running codeseeker 13B model on my macbook with no perf issues and I get a response within a few seconds.

Running a specialist model makes more sense on small devices.


I don't understand, make it slower so it's faster?


My 2080Ti at half speed would still beat the crap out of my 5900X CPU for inference, as long as the model fits in VRAM.

I think that's what GP was alluding to.


Maybe that's not too bad for someone who wants to use pre-existing models. Their AI Playground examples require at minimum an Intel Core Ultra H CPU, which is quite low-powered compared to even these dedicated GPUs: https://github.com/intel/AI-Playground


I don't know a single person in real life that has any desire to run local LLMs. Even amongst my colleagues and tech friends, not very many use LLMs period. It's still very niche outside AI enthusiasts. GPT is better than anything I can run locally anyway. It's not as popular as you think it is.


I run a 12GB model on my 3060 and use it to help answer healthcare questions. I'm currently doing a medical residency. (No I don't use it to diagnose). It helps comply with any HIPAA style regulations. I sometimes use it to fix up my emails. Not sure why people are longing for a 128GB card, just download a quantized model and run with LM Studio (https://lmstudio.ai/). At least two of my colleagues are using ChatGPT on a regular basis. LLMs are being used in the ER department. LLMs and speech models are being used in psychiatry visits.


A 128GB card could run llama 3.1B 70B with FP8. It is a huge increase in quality over what the current 24GB cards can do.


The only consumer demand for local AI models is for generating pornography


How about running your intelligent home with a voice assistant on your own computer? In privacy-oriented countries (Germany) that would be massive.


This is what I'm fiddling with. My 2080Ti is not quite enough to make it viable. I find the small models fail too often, so need larger Whisper and LLM models.

Like the 4060 Ti would have been a nice fit if it hadn't been for the narrow memory bus, which makes it slower than my 2080 Ti for LLM inference.

A more expensive card has the downside of not being cheap enough to justify idling in my server, and my gaming card is at times busy gaming.


absolutely wrong -- if you're not clever enough to think of any other reason to run an LLM locally then don't condemn the rest of the world to "well they're just using it for porno!"


so you're saying that a huge market?!


I want local copilot. I would pay for this.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: