I get about the same speed (35 tok/s) out of 13B Llama2 on a 4070 Ti, FWIW.

		kilnr on Sept 13, 2023 \| parent \| context \| favorite \| on: Exllamav2: Inference library for running LLMs loca... I get about the same speed (35 tok/s) out of 13B Llama2 on a 4070 Ti, FWIW.