Compete with Llama.cpp? Like transformers llama [0], exllama [1] (really fast), ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		theaiquestion on June 13, 2023 \| parent \| context \| favorite \| on: Llama.cpp: Full CUDA GPU Acceleration Compete with Llama.cpp? Like transformers llama [0], exllama [1] (really fast), or litllama [2] ? exllama is really memory efficient and really fast [0] https://huggingface.co/docs/transformers/main/model_doc/llam... [1] https://github.com/turboderp/exllama [2] https://github.com/Lightning-AI/lit-llama EDIT: Or do you mean cuda? Because yeah, it's such a shame AMD's Rocm is so bad even geohot gave up. it's examples don't even run without crashing. https://github.com/RadeonOpenCompute/ROCm/issues/2198#issuec...

kayvr on June 13, 2023 | [–]

Also https://github.com/kayvr/TokenHawk, a WebGPU implementation of LLaMA.

edit: Note that this is my project.

dTal on June 13, 2023 | [–]

Thanks for the tip about exllama, I've been on the lookout for a readable python implementation to play with that is also fast and has support for quantized datasets.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact