Write a couple of lore books, in-universe cyclopedia, some character sheets and exclusively train on them. Maybe some out-of-game lore for cross-over universes!
The question that poses to me is the quantity of writing you need for training before you can reasonably expect a generation system to produce something new and interesting, however much work on the right knowledge is in the right place, and is worth the costs for how you expect the player to interact with the game beyond the manual work.
I doubt there's telemetry in the elder scrolls games, but I'd love to know how many go around the world exploring everything the characters have to say, or reading all the books. How many get the lore in secondary media, wikis or watching a retelling or summary on youtube. On a certain level it's important they're there as an opt-in method to convey the 'secondary' world lore to the player without a "sit down and listen" info dump, plus give the impression it was written by someone so these objects would would exist organically in the world or certain characters would talk about those topics, but I wonder how much of the illusion would still be there if it was just each book having a title.
Is that feasible? I was under the impression that fully training an LLM requires untold mountains of data, way more than a game dev company could reasonably create.
You are correct. The fact that so many people are saying “lol just train it on text about the game bro” reveals how little people understand how these models work, how they are trained, etc.
Microsoft's phi models are trained on a much smaller dataset. They generally aren't as amazing as the models that get talked about more, but they are more than enough to get the job done for npc lines in a game.
For this to work you pretty much have to start from scratch, putting in "obvious" things like "the sun exists and when its out it casts light and shadow" and "water is a liquid (what's a liquid?) and flows downhill". Is there a corpus of information like this, but also free of facts that might be anachronistic in-universe?