Wow. Ok. I did not know that. I thought that there is depth information embedded...

eightails · on Feb 14, 2022

> But, of course, I'm overlooking something here. Because if you take the same portrait at 50mm and with, say, 20mm, it's not just the focal length of the camera that differs. What also differs is the position of each camera. The 50mm camera will be positioned further away from the subject, whereas the 20mm camera has to be positioned much closer to achieve the same "shot".

Yep, totally.

> Perhaps it helps that the vehicle moves? That is, after all, very close to having the same scene photographed by cameras positioned at different distances.

I think you're right, they must be taking advantage of this to get the kind of results they are getting. That point cloud footage is impressive, it's hard to imagine getting that kind of detail and accuracy just from individual 2d stills.

Maybe this also gives some insight into the situations where the system seems to struggle. When moving forward in a straight line, objects in the peripheral will shift noticeably in relative size, position and orientation within the frame, whereas objects directly in front will only change in size, not position or orientation. You can see this effect just by moving your head back and forth.

So it might be that the net has less information to go on when considering objects stationary directly in or slightly adjacent to the vehicles path -- which seems to be one of the scenarios where it makes mistakes in the real world, e.g. with stationary emergency vehicles. I'm just speculating here though.

> Also, among the front-facing cameras, the two outermost are at least a few centimeters apart. I haven't measured it, but it looks like a distance not unlike between a human's eyes [0]. Maybe that's already enough?

Maybe. The distance between the cameras is pretty small from memory, less than in human eyes I would say. It would also only work over a smaller section of the forward view due to the difference in focal length between the cams. I can't help but think that if they really wanted to take advantage of binocular vision, they would have used more optimal hardware. So I guess that implies that the engineers are confident that what they have should be sufficient, one way or another.