I have a feeling this is somehow different. The tool is broad enough, that I don't have to think myself in a wide variety of tasks, not just one. Which hurts my intelligence way more.
Most critical piece of information I couldn’t find is - how many shot was this?
Could it understand the solution is correct by itself (one-shot)? Or did it have just great math intuition and knowledge? How the solutions were validated if it was 10-100 shot?
The solutions were evaluated on their submitted output. You're allowed to use multiple 'shots' to produce the output, but just one submission per question. People are allowed this same affordance.
This is really interesting, because I do gamejams from time to time - and I try every time to make it work, but I'm still quite a lot faster doing stuff myself.
This is visible under extreme time pressure of producing a working game in 72 hours (our team scores consistenly top 100 in Ludum Dare which is a somewhat high standard).
We use a popular Unity game engine all LLMs have wealth of experience (as in game development in general), but the output is 80% so strangely "almost correct but not usable" that I cannot take the luxury of letting it figure it out, and use it as fancy autocomplete. And I also still check docs and Stackoverflow-style forums a lot, because of stuff it plainly mades up.
One of the reasons is maybe our game mechanics often is a bit off the beaten road, though the last game we made was literally a platformer with rope physics (LLM could not produce a good idea how to make stable and simple rope physics under our constraints codeable in 3 hours time).
But they definitely could and were [0]. You just employ multiple, and cross check - with the ability of every single one to also double check and correct errors.
LLMs cannot double check, and multiples won't really help (I suspect ultimately for the same reason - exponential multiplication of errors [1])
Not possible, because can't be guardrailed with 100% accuracy. You'll ask it something outside of the Warcraft world (e.g. US politics), and it'll happily oblige. I imagine NPCs will generate really weird immersion breaking stuff even if you cannot freeform interact with them anyway.
You can do that also while playing a traditional tabletop RPG. Players typically don't do it because why would they ruin immersion?
I understand that in multiplayer with strangers it would be a problem because you could affect other players' experiences, but in a single-player game I don't see this as a big issue, as long as the NPC doesn't spontaneously bring immersion-breaking topics into the conversation without the player starting it (which I suppose could be achieved with a suitable system prompt and some fine-tuning on in-lore text).
If it's the player that wants to troll the game and break immersion by "jailbreaking" the NPCs, it's on them, just like if they use a cheat code and make the game trivial.
It's still gonna be hallucinatory AI slop. For the same reasons it makes uninteresting quests and boring planets. It's lazy and it can't replace actual writing and art.
AI is great for getting tasks done where you can pull the information you need out of the slop. For quality immersive entertainment it's not there.
I’m not at all sure of this. You can use classifiers, fine tuning, and prompting to mitigate the issue both on user input and model output. And you’d probably want a bunch of fine tuning anyway to get their voice right.
> Not possible, because can't be guardrailed with 100% accuracy. You'll ask it something outside of the Warcraft world (e.g. US politics), and it'll happily oblige. I imagine NPCs will generate really weird immersion breaking stuff even if you cannot freeform interact with them anyway.
> Not to mention the current token cost.
You of course have to train the AI from ground up and on material that is as much as possible only related to the topics that are in the game world (i.e. don't include real-world events in the training data that has no implications in-universe).
You don't, for example, expect some ordinary farmer or tramp in the game world to know a lot about the (in-game) world or be capable of doing deep conversations about complicated topics.
So I don't think the necessary amount of text that you need to train the AI on is as insanely large as you imagine (but of course nevertheless a lot of texts have to be written - this is the price of having "much more dynamic" AI characters in the game).
Write a couple of lore books, in-universe cyclopedia, some character sheets and exclusively train on them. Maybe some out-of-game lore for cross-over universes!
The question that poses to me is the quantity of writing you need for training before you can reasonably expect a generation system to produce something new and interesting, however much work on the right knowledge is in the right place, and is worth the costs for how you expect the player to interact with the game beyond the manual work.
I doubt there's telemetry in the elder scrolls games, but I'd love to know how many go around the world exploring everything the characters have to say, or reading all the books. How many get the lore in secondary media, wikis or watching a retelling or summary on youtube. On a certain level it's important they're there as an opt-in method to convey the 'secondary' world lore to the player without a "sit down and listen" info dump, plus give the impression it was written by someone so these objects would would exist organically in the world or certain characters would talk about those topics, but I wonder how much of the illusion would still be there if it was just each book having a title.
Is that feasible? I was under the impression that fully training an LLM requires untold mountains of data, way more than a game dev company could reasonably create.
You are correct. The fact that so many people are saying “lol just train it on text about the game bro” reveals how little people understand how these models work, how they are trained, etc.
Microsoft's phi models are trained on a much smaller dataset. They generally aren't as amazing as the models that get talked about more, but they are more than enough to get the job done for npc lines in a game.
For this to work you pretty much have to start from scratch, putting in "obvious" things like "the sun exists and when its out it casts light and shadow" and "water is a liquid (what's a liquid?) and flows downhill". Is there a corpus of information like this, but also free of facts that might be anachronistic in-universe?
With the advent of unoptimized UE5 releases becoming the norm and the mentality of shipping badly broken games by default and them only being in a good state years later if at all, I’m not sure running an LLM on device would be a good idea.
Used to live near Japan, in a 5-story concrete building. We had so many earthquakes, the things were getting thrown off the shelves multiple times a year. There is nowhere to leave for a week. And there is a very slim chance anything would collapse, because the buildings are specifically built to withstand up to around 9 richter.
If we can create a human-level intelligence in the computer - it would already be superintelligence. No human on Earth is capable of reading and remembering Internet scale corpus of data, or doing math at GHz speeds, etc.
When it comes to speed, the comparison I like to use is that transistors are faster than synapses by the ratio to which a marathon runner is faster than continental drift.
Start with game jams. When you're constrained to make a complete game in 48/72 hours, suddenly you start to understand what's important and what's not. Granted, I haven't done any doors in my jam games yet =)
You could have a codebase subtly broken on so many levels that you cannot fix it without starting from scratch - losing months.
You could slowly lose your ability to think and judge.