exllama is really memory efficient and really fast
[0] https://huggingface.co/docs/transformers/main/model_doc/llam...
[1] https://github.com/turboderp/exllama
[2] https://github.com/Lightning-AI/lit-llama
EDIT: Or do you mean cuda? Because yeah, it's such a shame AMD's Rocm is so bad even geohot gave up. it's examples don't even run without crashing.
https://github.com/RadeonOpenCompute/ROCm/issues/2198#issuec...
edit: Note that this is my project.
exllama is really memory efficient and really fast
[0] https://huggingface.co/docs/transformers/main/model_doc/llam...
[1] https://github.com/turboderp/exllama
[2] https://github.com/Lightning-AI/lit-llama
EDIT: Or do you mean cuda? Because yeah, it's such a shame AMD's Rocm is so bad even geohot gave up. it's examples don't even run without crashing.
https://github.com/RadeonOpenCompute/ROCm/issues/2198#issuec...