That’s true, I’ve only run up to 30B. My understanding was they’re limited to a context window of 2048 tokens based on their training and stuff like llama.cpp has an even smaller input context. You can quickly run over that if you’re doing things like appending a result set to a complex prompt. But if others have working examples of using LLama models with large prompts, I’d be interested to see them.