> This essentially turns the operation of a robot into a kind of video game, where inputs are only needed a in low-dimensional abstract form, such as "empty the dishwasher" or "repeat what I do" or "put your finger in the loop and pull the string"
I don't really understand, how is this like a video game? What about these inputs is "low-dimensional"? How does what you describe interact with a "high-level control agents like SIMA 2"? Doesn't SIMA 2 translate inputs like "empty the dishwasher" into key presses or interaction with some other direct control interface?
Say you want to steer an android to walk forward. You need to provide angles or forces or voltages for all the actuators for every moment in time, so that's high dimensional. If you already have certain control models, neural or not, you can instead just press forward on a joystick. So what I mean low dimensional input is when someone steers a robot using a controller. That's got like, idk, 10-20 dimensions max. And my understanding is that SIMA 2 when it plays No Man's Sky or whatever basically provides such low dimensional controls, like a video game. Companies like Figure and Tesla are training models that can do tasks like folding clothes or emptying the dishwasher given low dimensional inputs like "move in this direction and tidy up". SIMA has the understanding to provide these inputs.
I don't really understand, how is this like a video game? What about these inputs is "low-dimensional"? How does what you describe interact with a "high-level control agents like SIMA 2"? Doesn't SIMA 2 translate inputs like "empty the dishwasher" into key presses or interaction with some other direct control interface?