Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We already knew it's possible from AI minecraft (https://oasis.decart.ai). This is just a more impressive version of that, trained on a wider range of games and with more context frames (Oasis has about a second of context, this one a minute). Even the architecture seems to be about the same.

Had they released this two months earlier it would have been incredibly impressive. Now it's still cool and inspiring, but no longer as ground breaking. It's the cooler version that doesn't come with a demo or any hope of actually trying it out.

And with the things we know from Oasis's demo, the agent-training use case the post tries to sell for Genie 2 is a hard sell. Any attempt to train an agent on such a world would likely look like an AI Minecraft speedrun: generate enough misleading context frames to trick the AI into generating what you want



This is far beyond Oasis. Oasis had approximately 0 continuity, and the generated world was a blurry mess. This on the other hand actually approaches usability.


Oasis is playable so therefore:

1. Non-cherrypicked in its consistency (if you look at the demonstrations in the Oasis blog post you can find specific cases of consistency which is an anomaly rather than the norm)

2. Is live-inferenced at 20fps. If you use Runway v3 which is a comparably larger and higher quality model (resolution and consistency) it might take a minute or two generate 10 seconds of video.

3. Is served (relatively) reliably at consumer scale (with queues of 5-10k concurrent players) which means that in order to save on GPU cost, you increase batch size and decrease model size to “fit” more players in 1 GPU.


And it works on a wide variety of games, instead of just a single one with a relatively consistent art style. On the other hand, Oasis was realtime, while this one is offline; IMO getting the inference speed down was their most impressive feat, as even most decent video gen models are slower than that.


I don't know what the pipeline looks for these, but I assume that's due to the costs associated with training and running. Oasis had a context of only a couple of frames, while this genie model apparently runs for a couple of minutes. I guess they have a couple tricks up their sleeve to optimize this though.


Here is a thread of videos from my tests of the recent DIAMOND model: https://x.com/chrisoffner3d/status/1845436198254227590

I really wonder how much more stable Genie 2 is.


That AI Minecraft feels like playing a dream, which is insanely cool.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: