Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

From a quick glance it seems to be about spatial reasoning problems. I think there is good reasons for why it's tricky to become extremely good at these from being trained on text and static images. Future models being further multimodally trained with video and then physics simulators should deal with this much better I think.

There's a recent talk about this by Jim Fan from Nvidia https://youtu.be/_2NijXqBESI




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: