I want to know if this is any different than all of the AMD AI Max PCs with 128gb of unified memory? The spec sheet say "128 GB LPDDR5x", so how is this better?
The GPU is significantly faster and it has cuda, though I'm not sure where it'd fit in the market.
At the lower price points you have the AMD machines which are significantly cheaper, even though they're slower and with worse support. Then there's apple's with higher memory bandwidth and even the nvidia agx Thor is faster in GPU compute at the cost of worse CPU and networking, and at the 3-4K price point even a threadripper system becomes viable that can get significantly more memory
> The GPU is significantly faster and it has cuda,
But (non-batched) LLM processing is usually limited by memory bandwidth, isn't it? Any extra speed the GPU has is not used by current-day LLM inference.
I believe just inference is bandwidth limited, prompt processing and other tasks on the other hand needs the compute. As I understand it, the workstation is also as a whole focused on the local development process before readying things for the datacenters, not just running LLMs
Thanks for that - yes, I haven’t quite gotten on the “just use AI search for everything now” bandwagon, but of course it makes a lot of sense that it’d be in there somewhere.
Guess I’m gonna go to a local service place with this PDF and the TV and see what they can do. I’m filled with anticipation for the day that I can boot up a terminal on Sony’s first TV and include it in one of my exhibits.
I do retro computing exhibits, in case you were wondering why I have all this junk… ;)
Didn't someone back in the day write a library that let you import an arbitrary Python function from Github by name only? It obviously was meant as a joke, but with AIcolytes everywhere you can't really tell anymore...
Flask also started as an April 1st joke, in response to bottle.py but ever so slightly more sane. It gathered so much positive response, that mitsuhiko basically had to make it into a real thing, and later regretted the API choices (like global variables proxying per-request objects).
If you use a deterministic sampling strategy for the next token (e.g., always output the token with the highest probability) then a traditional LLM should be deterministic on the same hardware/software stack.
Wouldn't seeding the RNG used to pick the next token be more configurable? How would changing the hardware/other software make a difference to what comes out of the model?
Honestly if you haven't ever used got bisect I'd say you're missing out on a very powerful tool. To be able to, without any knowledge of the code base, isolate down to the exact commit that introduced a big is incredibly powerful
Everyone has there own preferences, but I'd look into uv if I were you. It allows you to specify the python version, and for scripts you can even specify the python version as part of the shebang
uv is literally the goat except I haven't able to make vllm work in uv for some reason. Though aside from that, I think I need to use shebang more because I don't use it as often right now.
Much appreciated. The only donation I can think of is if you help spread the word. I learned a tough lesson this week that it is extraordinarily hard to market a product lol
P.S. If it wasn't for @dang putting it in the second-chance-pool, this post would have never been seen by more than 2 people.
https://nvdam.widen.net/s/tlzm8smqjx/workstation-datasheet-d...
reply