every flavor of llama up to 65b?

bugglebeetle · on April 14, 2023

That’s true, I’ve only run up to 30B. My understanding was they’re limited to a context window of 2048 tokens based on their training and stuff like llama.cpp has an even smaller input context. You can quickly run over that if you’re doing things like appending a result set to a complex prompt. But if others have working examples of using LLama models with large prompts, I’d be interested to see them.

logicchains · on April 14, 2023

In llama.cpp you can use a flag on ./main to set a custom context size, that can be up to 2048.

bugglebeetle · on April 14, 2023

Ah, ok. I’ve been working with the Python bindings most recently and must have missed that.