> All example are already correlated because they are generated in the same way.
All examples of “document information extraction” would be correlated no matter where they come from because they all would be “document information extraction” examples…
The real question is whether or not the examples are representative of the broad “document information extraction” use-case.
The problem is the methodology they use to hold them out. For a truly independent validation set, they need to hold out the material before augmentation, not after.
If you hold out after augmentation, then you leverage biases from the training regimen already and hence you artificially boost your model's performance. This is not sufficient to demonstrate your model is generalizing properly.
In analogy: instead of taking leaves off of different trees, they are taking leaves from different branches from the same tree.
That would definitely make the evaluation more robust. My fear is that with LLMs at hand people became allergic to preparing good human-labelled evaluation sets and would always to some degree use an LLM as a crutch.
I’m wondering how they really prevent uploads of other peoples faces if they take a clip of a video of another person. I’m sure Apple didn’t open up the 3d Face ID scanning to them to verify
No doubt they can create Hollywood quality clips if the tools are good enough to keep objects consistent, example, coming back to the same scene with same decor and also emotional consistency in actors
I think this is not nearly as important as most people think it is.
In hollywood movies, everyone already knows about "continuity errors" - like when the water level of a glass goes up over time due to shots being spliced together. Sometimes shots with continuity errors are explicitly chosen by the editor because it had the most emotional resonance for the scene.
These types of things rarely affect our human subjective enjoyment of a video.
In terms of physics errors - current human CGI has physics errors. People just accept it and move on.
We know that superman can't lift an airplane because all of that weight on a single point of the fuselage doesn't hold, but like whatever.
Location consistency is important. Even something as simple and subtle as breaking the 180-rule [1] feels super uncanny to most audiences. Let alone changing the set the actor occupies, their wardrobe, props, etc.
There are lots of tools being built to address this, but they're still immature.
Well put. Honestly the actor part is mostly solved by now, the tricky part is depicting any kind of believable, persistent space across different shots. Based off of amateur outputs from places like https://www.reddit.com/r/aivideo/, at least!
This release is clearly capable of generating mind-blowingly realistic short clips, but I don't see any evidence that longer, multi-shot videos can be automated yet. With a professional's time and existing editing techniques, however...
I wonder if this stuff is trained on enough Hallmark movies that even AI actors will buy a hot coffee at a cafe and then proceed to flail the empty cup around like the humans do. Really takes me out of the scene every time - they can't even put water in the cup!?
No way man, this is why i loved Mr Robot, they actually payed a real expert and worked story around realism and not just made up gobbleygook that shuts my brain off entirely to its nonsense
This is what happens when you let the AI run for 30 minutes. Ain’t no way you will read the code with much critique if it’s a 1 hour+ read. You have to generate compartmentized code so you don’t need to check much
Maybe inside of a social network specially for AI, but a concerning number of people don't realize images and videos are AI, even when it's bad AI. As it gets better, and starts integrating the poster's image (like Sora 2), that's going to get even worse.
reply