Interesting - do you need to take any special measures to get OSS genAI models to work on this architecture?
Can you use inference engines like Ollama and vLLM off-the-shelf (as Docker containers) there, with just the Radeon 8060S GPU? What token rates do you achieve?
That's AMD Ryzen AI Max+ 395, right? Lots of those boxes popping up recently, but isn't that dog slow? And I can't believe I'm saying this - but maybe RAM filled-up mac might be a better option?
A GMKtec or a Framework desktop with a Strix Halo/AI Max CPU is about the cheapest way to run a model that needs to fit into about 120GB of memory. Macs have twice the memory bandwidth of these units, so will run significantly faster, but they're also much more expensive. Technically, you could run these models on any desktop PC with 128GB of RAM, but that's a whole different level of "dog slow." It really depends on how much you're prepared to pay to run these bigger models locally.
from the .de website I see 2000eur for the 128GB, But looking at the shipping info, it sounds like it might still be shipped from .cn: ´Please ensure you can handle customs clearance and taxes yourself.´
Also it is Windows 11 which is a big No from me.
But if this is the start of the local big model capable hardware it looks quite hopeful. A 2nd hand M2 128GB studio (which I can use Asahi on) is currently ~3600eur
When I was looking it was more like 1.6k euros, but still great price. Mac studio with M4 Max 16/40/16 with 128GB is double that. That's all within a range of "affordable". Now, if it's at least twice the speed, I don't see a reason not to. Even though my religion is against buying a mac as well.
edit, just took a look at amazon. GMKtec EVO-X2 AI, which is the AMD Ryzen AI Max+ 395 with 128GB of RAM is 3k euros. Mac M4 Max with 16 cores and 128 gigs is 4.4k euros. Damn, Europe. If you go with M4 Max with 14 cores, but still 16 cores of "Neural engine"... ah, you can't get 128 GB of RAM then. Classic Apple :)
edit2: look at gmktec site itself. machine is 2k euros there. Damn, amazon.
I have a M4 Max with 128 GB memory. Even on that machine I would not consider 70B+ models to be useable. Once you go below 20 tokens/s it becomes more like having a pen pal than an AI assistant.
MoE models can still be pretty fast. As are smaller models.
(This is mostly a warning for anyone who is enamored by the idea of running these things locally to make sure to test it before you spend a lot of money.)
Currently I'd probably say the Nvidia RTX pro 6000 is a Challenger if you want local models. It "only" has 96 GB of RAM, but it's very fast (1800 GB/s). If you can fit the model on it and it's good enough for your use case then it's probably worth it even at $10k.
Loooking closely, the shop does not seem to be located within the EU. And the 50€ discount does not apply to the 128GB config.
Also, if you are interested, it might help to have a look into the user forum: https://de.gmktec.com/community/xenforum
I will admit that the price difference was a big value differentiator for me since speed is not a priority ( playing with big models at a reasonable price is ).