Hacker Newsnew | past | comments | ask | show | jobs | submit | m3kw9's commentslogin

How are they trained to your preferences?

I think the definition isn’t accurate, it’s more like engineering using AI.

Is this inferring to the double slit experiment?


They should call it anti aging finding

So they tested using training examples? Lmao

> held out

Actually in this case that's not exactly true:

> generation of 281,128 augmented examples

All example are already correlated because they are generated in the same way.


> All example are already correlated because they are generated in the same way.

All examples of “document information extraction” would be correlated no matter where they come from because they all would be “document information extraction” examples…

The real question is whether or not the examples are representative of the broad “document information extraction” use-case.


The problem is the methodology they use to hold them out. For a truly independent validation set, they need to hold out the material before augmentation, not after. If you hold out after augmentation, then you leverage biases from the training regimen already and hence you artificially boost your model's performance. This is not sufficient to demonstrate your model is generalizing properly.

In analogy: instead of taking leaves off of different trees, they are taking leaves from different branches from the same tree.


That would definitely make the evaluation more robust. My fear is that with LLMs at hand people became allergic to preparing good human-labelled evaluation sets and would always to some degree use an LLM as a crutch.

I would agree with that

I’m eagerly awaiting for some unexpected social problems this crops up

I’m wondering how they really prevent uploads of other peoples faces if they take a clip of a video of another person. I’m sure Apple didn’t open up the 3d Face ID scanning to them to verify

No doubt they can create Hollywood quality clips if the tools are good enough to keep objects consistent, example, coming back to the same scene with same decor and also emotional consistency in actors

> keep objects consistent

I think this is not nearly as important as most people think it is.

In hollywood movies, everyone already knows about "continuity errors" - like when the water level of a glass goes up over time due to shots being spliced together. Sometimes shots with continuity errors are explicitly chosen by the editor because it had the most emotional resonance for the scene.

These types of things rarely affect our human subjective enjoyment of a video.

In terms of physics errors - current human CGI has physics errors. People just accept it and move on.

We know that superman can't lift an airplane because all of that weight on a single point of the fuselage doesn't hold, but like whatever.


Water level in a glass changing between shots is one thing, the protagonist’s face and clothes changing is another.

Location consistency is important. Even something as simple and subtle as breaking the 180-rule [1] feels super uncanny to most audiences. Let alone changing the set the actor occupies, their wardrobe, props, etc.

There are lots of tools being built to address this, but they're still immature.

https://x.com/get_artcraft/status/1972723816087392450 (This is something we built and are open sourcing - still has a ways to go.)

ComfyUI has a lot of tools for this, they're just hard to use for most people.

[1] https://en.wikipedia.org/wiki/180-degree_rule


Well put. Honestly the actor part is mostly solved by now, the tricky part is depicting any kind of believable, persistent space across different shots. Based off of amateur outputs from places like https://www.reddit.com/r/aivideo/, at least!

This release is clearly capable of generating mind-blowingly realistic short clips, but I don't see any evidence that longer, multi-shot videos can be automated yet. With a professional's time and existing editing techniques, however...


People got used to James Bond actors changing between movies, but from scene to scene in the same movie would be a bit confusing.

It all depends on quantity and "quality" of the continuity errors. There's even a job for it https://en.wikipedia.org/wiki/Script_supervisor

I wonder if this stuff is trained on enough Hallmark movies that even AI actors will buy a hot coffee at a cafe and then proceed to flail the empty cup around like the humans do. Really takes me out of the scene every time - they can't even put water in the cup!?

No way man, this is why i loved Mr Robot, they actually payed a real expert and worked story around realism and not just made up gobbleygook that shuts my brain off entirely to its nonsense

This is what happens when you let the AI run for 30 minutes. Ain’t no way you will read the code with much critique if it’s a 1 hour+ read. You have to generate compartmentized code so you don’t need to check much

Which seem to level the play field, at least virtually

Maybe inside of a social network specially for AI, but a concerning number of people don't realize images and videos are AI, even when it's bad AI. As it gets better, and starts integrating the poster's image (like Sora 2), that's going to get even worse.

Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: