Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Because the CPU has to load the model in parts for every cycle so you're spending a lot of time on IO and it offsets processing.

You're talking about completely different things here.

It's fine if you're doing a few requests at home, but if you're actually serving AI models, CUDA is the only reasonable choice other than ASICs.



My comment was about Intel having a starter project, getting enthusiastic response from devs, network effects and iterate from there. They need a way to threaten Nvidia and just focusing on what they can't do won't bring them there. There is one route where they can disturb Nvidia's high end over time and that's a cheap basic GPU with lots of RAM. Like Ryzen 1st gen whose single core performance was two generations behind Intel trashed Intel by providing 2x as many cores for cheap.


It would be a good idea to start with some basic understanding of GPU, and realizing why this can't easily be done.


That's a question M3 Max with its internal GPU already answered. It's not like I didn't do any HPC or CUDA work in the past to be completely clueless about how GPUs work though I haven't created those libraries myself.


What have you implemented in CUDA?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: