Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Copying daemons (gdrcopy) is about pumping data in and out of a single card. docker-nvidia and rest of the stack is enablement for using cards.

GPU-Direct is about pumping data from storage devices to cards, esp. from high speed storage systems across networks.

MIG actually shares a single card to multiple instances, so many processes or VMs can use a single card for smaller tasks.

Nothing I have written in my previous comment is related to inter-card, inter-server communication, but all are related to disk-GPU, CPU-GPU or RAM-CPU communication.

Edit: I mean, it's not OK to talk about downvoting, and downvote as you like but, I install and enable these cards for researchers. I know what I'm installing and what it does. C'mon now. :D



Mostly, I think, we don’t really understand your argument that Intel couldn’t easily replicate the parts needed only for inference.


Yeah, for example llama.cpp runs on Intel GPUs via Vulkan or SYCL. The latter is actively being maintained by Intel developers.

Obviously that is only one piece of software, but its a certainly a useful one if you are using one of the many LLMs it supports.


i've run inference on Intel Arc and it works just fine so i am not sure what you're talking about. I certainly didn't need docker! I've never tried to do anything on AMD yet.

I had the 16GB arc, and it was able to run inference at the speed i expected, but twice as many per batch as my 8GB card, which i think is about what you'd expect.

once the model is on the card, there's no "disk" anymore, so having more vram to load the model and the tokenizer and whatever else on means there's no disk, and realistically when i am running loads on my 24GB 3090 the CPU is maybe 4% over idle usage. My bottleneck, as it stands, to running large models is vram, not anything else.

If i needed to train (from scratch or whatever) i'd just rent time somewhere, even with a 128GB card locally, because obviously more tensors is better.

and you're getting downvoted because there's literally lm studio and llama.cpp and sd-webui that run just fine for inference on our non-dc, non-nvlink, 1/15th the cost GPUs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: