VUDA: A Vulkan Implementation of CUDA

fancyfredbot · on July 1, 2023

It's not an implementation of CUDA, it's an implementation of the CUDA runtime API. The API is used to configure the card, allocate and copy memory, and run kernels. Importantly you cannot use this to write the actual kernels which run on the GPU!

sheepscreek · on July 1, 2023

I was half hoping this meant running CUDA code on AMD GPUs. Thanks for clarifying.

empyrrhicist · on July 1, 2023

I know AMD has a whole bunch of (related?) projects for GPU compute, but man - if they could just provide an interop layer that Just Works they'd get immediate access to so much more market share.

sanxiyn · on July 1, 2023

Eh well, it is very close to just working. From "Training LLMs with AMD MI250 GPUs and MosaicML":

> It all just works. No code changes were needed.

https://www.mosaicml.com/blog/amd-mi250

paulmd · on July 1, 2023

“Just works” in this context means executing the compiled CUDA or the PTX bytecode without recompiling. Nobody is ever going to utilize ROCm if it requires distributing as source and recompiling.

To make it even more insulting, even simply installing ROCm itself is a massive burden, even on an ostensibly-supported (as geohot discovered) and even just “it works out of the box if you distribute and compile it locally” is ignoring that whole massive “draw the rest of the owl” stage of getting ROCm installed and building properly in your environment.

JoshTriplett · on July 2, 2023

> “Just works” in this context means executing the compiled CUDA or the PTX bytecode without recompiling. Nobody is ever going to utilize ROCm if it requires distributing as source and recompiling.

Even a source-compatible layer that let you just recompile CUDA code for an AMD GPU would be a huge improvement. That alone would eliminate the CUDA lock-in.

causality0 · on July 1, 2023

Don't forget AMD doesn't seem to even care about ROCm themselves. Six months in and RDNA3 cards still don't support it. Can you imagine if Nvidia launched RTX40- cards with no DLSS even though 30- cards already had it, and six months started boasting about how DLSS support was "coming this fall"?

wmf · on July 1, 2023

ROCm is for CDNA not RDNA. It has limited, best-effort RDNA support for a few cards.

i80and · on July 1, 2023

I've been running PyTorch on my Radeon 7900 XT using ROCm. Is that not supposed to work?

graphe · on July 1, 2023

No, it actually isn't supposed to work, it's not officially supported. https://sep5.readthedocs.io/en/latest/Installation_Guide/Ins...

slavik81 · on July 1, 2023

The hardware that is officially supported is a subset of the hardware that works. You are correct that the RX 7900 XT is not officially supported, but I must point out that you are linking to a fork of the documentation from 2019. This is the official ROCm documentation: https://rocm.docs.amd.com/en/latest/release/gpu_os_support.h...

i80and · on July 1, 2023

Fascinating. And yet.

stuaxo · on July 2, 2023

Having to use a special kernel for ROCm is a real pain, I can't just use it like I can with Mesa.

I have enough issues using graphics already so I'll stick with Mesa.

juunpp · on July 2, 2023

That's HIP / ROCm: https://rocm.docs.amd.com/projects/HIP/en/latest/index.html

But it currently only runs on CDNA boards and enterprise-y Linux distros (Ubuntu LTS, Centos, etc.)

abandonability · on July 1, 2023

It is coming from what I can tell

meragrin_ · on July 1, 2023

It's been coming for years now. I will probably be years before it really is here.

heyoni · on July 1, 2023

Then they too could call themselves an AI company!

empyrrhicist · on July 1, 2023

I've been hoping for it for so long - I wonder if there's enough interest that someone could do a GoFundMe to hire at least one full time dev lol.

jdoerfert · on July 1, 2023

Shameless plug: https://www.osti.gov/servlets/purl/1892137

TLDR; If you provide even more functions through the overloaded headers, incl. "hidden ones", e.g., `__cudaPushCallConfiguration`, you can use LLVM/Clang as a CUDA compiler and target AMD GPUs, the host, and soon GPUs of two other manufacturers.

scienceplease · on July 2, 2023

This is really amazing work! Is it still ongoing/funded?

jdoerfert · on July 2, 2023

Yes, though with caveats. The driver and parts of the extended API we used to lower CUDA calls are in upstream LLVM. The wrapper headers are not. We will continue the process of getting it all to work in upstream/vanilla LLVM soon though. Help is always appreciated.

FWIW, we have some alternative ideas on how to get out of the vendor trap, as well as some existing prototypes to deal with things like CUBLAS and Thrust. Feel free to reach out, or just keep an eye out.

causality0 · on July 2, 2023

Vulkan doesn't exactly work great on AMD either. I'm in the process of returning a 7900XTX right now because of AMD's busted Vulkan drivers.

RicoElectrico · on July 1, 2023

So what is this useful for, then?

pjmlp · on July 1, 2023

Additionally, any wannabe CUDA replacement needs to support PTX and polyglot development, or is a non starter for lots of workloads.

einpoklum · on July 1, 2023

1. This implements the clunky C-ish API; there's also the Modern-C++ API wrappers, with automatic error checking, RAII resource control etc.; see: https://github.com/eyalroz/cuda-api-wrappers (due disclosure: I'm the author)

2. Implementing the _runtime_ API is not the right choice; it's important to implement the _driver_ API, otherwise you can't isolate contexts, dynamically add newly-compiled JIT kernels via modules etc.

3. This is less than 3000 lines of code. Wrapping all of the core CUDA APIs (driver, runtime, NVTX, JIT compilation of CUDA-C++ and of PTX) took me > 14,000 LoC.

rootw0rm · on July 1, 2023

nice project. this is why HN kicks ass

einpoklum · on July 1, 2023

Thanks for the compliment :-)

What I _really_ like to receive, though, is feedback from using the wrappers, ideas for changes/improvements, and of course messages volunteering to QA new versions before their release :-P

nightowl_games · on July 1, 2023

How does this relate to the goals outlined by George Hotz to bring ML to AMD chips and break the Nvidia dominance?

I'm not an expert here but this approach seems powerful and important. But this system seems complex enough to doubt the ability of an individual to build. It seems like this would need a corporate sponsor to get off the ground. Perhaps AMD itself would be interested in paying engineers to iterate on this?

jitl · on July 1, 2023

Holtz is talking about drivers too, not just user space libraries.

> The software is terrible! There’s kernel panics in the driver. You have to run a newer kernel than the Ubuntu default to make it remotely stable. I’m still not sure if the driver supports putting two cards in one machine, or if there’s some poorly written global state. When I put the second card in and run an OpenCL program, half the time it kernel panics and you have to reboot.

He also talks about user space stuff but clearly he thinks the whole stack, above and below this kind of library also needs a lot of work.

iforgotpassword · on July 1, 2023

This is still so mind-boggling to me. AMD should be in a good financial position now that Zen was such a success, and that their GPUs are catching up too. Why are their drivers still a Clusterfuck across the board after all these years? Why not throw more manpower at the problem?

I'm sure even if their GPUs were twice as fast as Nvidia's, everybody would still buy team green because it's better to have a card that works than a broken piece of garbage. We tried to get an MI50 to work reliably at work with KVM, but that thing was a complete dumpster fire. A colleague just bought a 7900XTX for gaming and spent days getting it to work. This included three Windows reinstalls. And that use case is gaming on Windows, which supposedly is the best supported case. It only gets worse from there. Compute on Linux? Lol.

Now last time this topic came up, someone claimed that AMD is pretty much at their limits production wise, and there are a few unnamed large companies buying loads of their cards for compute and cloud gaming, and AMD basically has engineers dedicated to making sure things work exactly for their use case, so they don't have to really care about the rest. Sounds pretty wild, but not completely unrealistic...

hedora · on July 1, 2023

I've had the opposite experience under Linux.

My old nvidia 570's drivers went into severe bitrot. Basic stuff like screensavers and desktops broke badly, and games were flaky. The card is still more than powerful enough for what I used it for.

I switched to AMD, with open source drivers. I get windows-level performance on AAA (and indie) games in steam, and zero compatibility issues with the rest of the Linux ecosystem.

amlib · on July 1, 2023

AMD GPU support in linux is a bit of a flip flop. Some generations seem to get a lot of love and work really damn well (sometimes with better and broader support than the windows drivers) like RDNA2 and most Polaris cards. Others such as Vega and specially now RDNA3 are a shitshow with a lot of things just broken.

gcoakes · on July 1, 2023

I used a Vega64 for years and just bought a 6000 series. Both work great on my machines. I'm typically running the bleeding edge kernels though, so that might explain it a bit. I would think Ubuntu is probably the most supported if you opt for the OEM kernel.

tyfon · on July 1, 2023

I ran vega 64 in linux for 3-4 years, it was really nice. It also worked without bugs with proton, the 3060 I have now gives me a lot of artifacts like incorrect lightning and even X crashes once in a while.

I'm considering switching back to an AMD card due to this.

aseipp · on July 1, 2023

It really depends on the card. I have an old RDNA workstation card in my server and the driver was a real crapshoot. I eventually started delaying updates to newer kernel releases (which would fix other bugs!) because there would be regressions of various kinds. Graphics under Linux is still a bit painful after all these years.

At least Nvidia finally open sourced their driver too, I guess. And Intel is still open source. But it still sucks a bit I think unless you do research.

fiddlerwoaroof · on July 1, 2023

Yeah, I’ve always bought AMD for this reason since some really bad experiences with Nvidia cards.

izacus · on July 1, 2023

Why are you talking about screensavers in a CUDA thread though? You can't compute on a fancy screensaver animation, you need a working CUDA driver that nVidia provides for that.

kbenson · on July 1, 2023

Likely because this subsection of the thread seems more focused on drivers in general and less on CUDA, if the other replies are anything to go by.

derstander · on July 1, 2023

> And that use case is gaming on Windows, which supposedly is the best supported case.

I’m being a little tongue-in-cheek here, but the best supported case for AMD is gaming via console: AMD provides CPU/GPU for the current generation of both the XBox and PlayStation consoles.

Which suggests to me that they shouldn’t have too much problem supporting their hardware on Windows or Linux. But that’s outside of my area of expertise. Maybe they need to spend too much engineer effort and time supporting the consoles at what’s probably a pretty thin profit margin?

Narishma · on July 1, 2023

The previous generation as well, and the one before for Xbox 360 and Wii, and the one before for Gamecube.

zamalek · on July 2, 2023

My 6900xt is flawless for gaming on Linux (sans RT support). The big issue is other use cases.

roadbuster · on July 1, 2023

> Why not throw more manpower at the problem?

Oh, of course.

iforgotpassword · on July 1, 2023

No, really. We've worked with Intel, Nvidia and AMD... Well for the latter, at least tried. We're not a big fish, but response time and quality of responses were stellar with Intel and Nvidia. AMD took weeks and even when asking very precise questions with lots of technical background, there was a lot of "hmm dunno have to find someone who'd know" kind of answers, and it would often take one to two weeks for a single reply. And that's not even dev work, it's just tech support for your own damn stuff you're trying to sell.

You can't seriously tell me that's not something they could fix.

cwillu · on July 1, 2023

Par for the course with new kernel things: it's unusual for something new in the kernel to be stable in the distro kernels unless they've devoted a great deal of effort to backport things.

arthur2e5 · on July 1, 2023

The way conservative distros define "stable" is part of the problem. For things less than 3 years old, going for stale versions often runs counter to "stable".

imtringued · on July 1, 2023

The only hope is rusticl and it happens inspite of AMD.

roenxi · on July 1, 2023

Hmm. I found

https://www.youtube.com/watch?v=Mr0rWJhv9jU

and

https://geohot.github.io/blog/jekyll/update/2023/06/07/a-div...

I feel a lot better about my journey with AMD now; there seemed to be some major issues with their GPU drivers. Now I know it wasn't just me.

netheril96 · on July 1, 2023

Just in case other people who have AMD GPU and run Windows have the same needs as I have, that is, to train or run machine learning models, please checkout torch-directml and tensorflow-directml.

HarHarVeryFunny · on July 1, 2023

I'm not sure this really makes any more sense than AMD chasing CUDA compatibility with ROCm/MiOpen/HIP. CUDA and DirectX seem too low level to be used as a compatibility API over widely divergent hardware (AMD vs NVidia) without giving up a lot of performance.

cuDNN being higher level offers more opportunity for compatibility without losing performance (i.e different implementations of kernels fine-tuned for optimal performance on AMD vs NVidia hardware), but the trouble is that so much of what frameworks like PyTorch do is based on custom kernels, not just cuDNN.

It seems the best bet for AMD would be a rock solid low level API (not a moving target) and support of high level optimizing ML compilers to reduce the level of effort for the framework (PyTorch, TensorFlow, JAX ...) vendors to provide framework-level support on top of that. Ultimately they'd need to work very closely with the framework vendors to provide this support, since they are the ones who would be benefiting from it.

It's odd how much of an afterthought ML support has seemed to be for AMD over the years... maybe the relative size of the consumer ML market vs graphics/gaming market didn't seem to make it worth their effort, but as NVidia has shown this is a path to gaining much more lucrative data center wins.

skocznymroczny · on July 1, 2023

How does it work? Last time I tried DirectML it wasn't well supposed and there was little software which supported it. Also the performance seemed to be not too great. I am currently using a Linux install because with ROCm I can use popular tools like Automatic111 webui and oobabooga.

netheril96 · on July 2, 2023

I trained a WGAN on torch-directml with no issues so the software seems quite supported. But I can’t speak of performance because I have nothing to compare against.

skocznymroczny · on July 11, 2023

I gave it a try. Yeah seems like it works well on the functional side, but the performance was at least 4x slower than what ROCm on Linux gives me.

bornfreddy · on July 1, 2023

Does that work? I might be in market for a new GPU if AMD had something that beats NVidia for ML (for sane price)... I can't really justify buying NVidia GPU, anything decent is too expensive.

netheril96 · on July 2, 2023

It works for me. I have no issues training a WGAN with it. But I don’t know how much slower it is compared to CUDA on a similar priced NVIDIA card.

wtcactus · on July 1, 2023

Seems dead. Last commit was February 2022.

qwertox · on July 1, 2023

And that was only one line added. Most of the code is 3 and 5 years old.

panzi · on July 1, 2023

There needs to be a 3rd implementation called SHUDA.

josalhor · on July 1, 2023

As someone who has never programmed directly for a GPU, how does this compare to HIP? Can this be an efficient abstraction over Nvidia and AMD GPUs?

westurner · on July 1, 2023

From https://news.ycombinator.com/item?id=34399633 :

>>> hipify-clang is a clang-based tool for translating CUDA sources into HIP sources. It translates CUDA source into an abstract syntax tree, which is traversed by transformation matchers. After applying all the matchers, the output HIP source is produced. [...]

(Edit) CUDA APIs supported by hipify-clang: https://rocm.docs.amd.com/projects/HIPIFY/en/latest/supporte...

jaimex2 · on July 1, 2023

I have no hope for AMD. They should have made compatibility tools yesterday.

gymbeaux · on July 1, 2023

Things like this pop up relatively often but they never pick up steam and I am still using Nvidia GPUs. I would imagine this is no different.

saboot · on July 1, 2023

That is quite interesting .. so I should be able to run my cuda accelerated programs on AMD and Intel devices then, correct?

xeonmc · on July 1, 2023

Huge missed opportunity to call it "Vuudoo"

Conscat · on July 2, 2023

Might be confusing with Voodoo

https://github.com/cogciprocate/voodoo

userbinator · on July 2, 2023

That's already confusing enough with the 3dfx GPUs of the same name:

https://en.wikipedia.org/wiki/3dfx_Interactive#Product_devel...

AndrewKemendo · on July 1, 2023

This could be a big deal if we have an actual alternative to CUDA

I can’t see NVIDIA letting this just exist

Ballas · on July 1, 2023

My opinion is that CUDA is not the mote keeping the others out - it's the CUDNN (and CUBLAS), more specifically the level to which they are optimized.

roenxi · on July 1, 2023

I'm not even certain optimisation matters. I can crash my machine (AMD graphics) with a stock Debian install by letting something attempt BLAS on the GPU.

The situation is starting to improve though. Installed a bunch of libraries from https://repo.radeon.com/rocm/apt/5.4 jammy main and the crashes got less frequent. I don't have a lot of faith in AMD to deliver reliable BLAS libraries at this point, but it could happen. The hardware is there, I just don't think they're prioritising supporting the right places in the distribution chain or supporting consumer-level graphics.

Ballas · on July 2, 2023

I do find it strange that AMD is not allocating more resources to ROCM, given that that seems to be where the money is, at least from my viewpoint. I guess they have been able to sell more cards than they could manufacture, but that seems to be changing.

metal_am · on July 1, 2023

How about something like MAGMA?

roenxi · on July 2, 2023

No idea. But if it goes through OpenCl it will likely expose the same bugs.

flykespice · on July 1, 2023

And on what legal ground would NVIDIA have to take this down?

themoonisachees · on July 1, 2023

When you have Nvidia money you don't need grounds to sue, the lawyers will think of something and drag any open-source devs through years-long suits.

The only saving grace would be Oracle v Google which established the de jure that an API isn't copyrightable.

hedora · on July 1, 2023

APIs weren't copyrightable before Oracle v Google. There was plenty of precedent saying that. For example, before they were called Oracle, they built a clone of IBM SEQUEL.

The main concern with Oracle v Google was that the court would ignore or misinterpret the existing precedent.

A secondary concern was that a Google employee formerly worked on Java at Sun (and/or Oracle), and copy-pasted some implementation source code from oracle to google's code bases. There was a real possibility the "APIs aren't copyrightable" precedent would stand, but the courts would rule that Google couldn't continue distributing Dalvik.

aidenn0 · on July 2, 2023

I thought that Oracle v Google established that APIs are copyrightable, but Google's copying met the terms for fair use?

DrNosferatu · on July 1, 2023

Sounds great!

How far is it actually compatible right now?

Are there any tests / benchmarks?

Can this be used to run CUDA-accelerated LLMs?

syntaxers · on July 1, 2023

> > VUDA only takes kernels in SPIR-V format. VUDA does not provide any support for compiling CUDA C kernels directly to SPIR-V (yet). However, it does not know or care how the SPIR-V source was created - may it be GLSL, HLSL, OpenCL.

So the answer is no, it can't be used with kernels that use cublas or cudnn, which excludes almost all ML use-cases.

johndough · on July 1, 2023

The Vulkan spec does not enforce strict IEEE 754 floating point semantics, so perfect compatibility with CUDA is impossible.

https://registry.khronos.org/vulkan/specs/1.3-khr-extensions...

However, the deep learning field does currently not pay much attention to reproducibility, so this might not be a big issue.

my123 · on July 1, 2023

Memory management in Vulkan is _very_ restricted. Nothing remotely like UVM. CUDA on Vulkan for these reasons will always stay a pet project at best, with no shot at usable quality whatsoever.

empyrrhicist · on July 1, 2023

Doesn't look very active, does it still work?