Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Wow. Ok. I did not know that. I thought that there is depth information embedded in the diff between the images taken at different focal lengths.

I'm still wondering. As a photographer, you learn that you always want to use a focal length of 50mm+ for portraits. Otherwise, the face will look distorted. And even a non-photographer can often intuitively tell a professional photo from an iPhone selfie. The wider angle of the iPhone selfie lens changes the geometry of the face. It is very subtle. But if you took both images and overlayed them, you see that there are differences.

But, of course, I'm overlooking something here. Because if you take the same portrait at 50mm and with, say, 20mm, it's not just the focal length of the camera that differs. What also differs is the position of each camera. The 50mm camera will be positioned further away from the subject, whereas the 20mm camera has to be positioned much closer to achieve the same "shot".

So while there are differences in the geometry of the picture, these are there not because of the difference in the lenses being used, but because of the difference in the camera-subject distance.

So now I'm wondering, too, why Tesla decided against stereo vision.

It does seem, though, that they are getting that depth information through other means:

Tesla 3D point cloud: https://www.youtube.com/watch?v=YKtCD7F0Ih4

Tesla 3D depth perception: https://twitter.com/sendmcjak/status/1412607475879137280?s=6...

Tesla 3D scene reconstruction: https://twitter.com/tesla/status/1120815737654767616

Perhaps it helps that the vehicle moves? That is, after all, very close to having the same scene photographed by cameras positioned at different distances. Only that Tesla uses the same camera, but has it moving.

Also, among the front-facing cameras, the two outermost are at least a few centimeters apart. I haven't measured it, but it looks like a distance not unlike between a human's eyes [0]. Maybe that's already enough?

[0] https://www.notateslaapp.com/images/news/2022/camera-housing...



> But, of course, I'm overlooking something here. Because if you take the same portrait at 50mm and with, say, 20mm, it's not just the focal length of the camera that differs. What also differs is the position of each camera. The 50mm camera will be positioned further away from the subject, whereas the 20mm camera has to be positioned much closer to achieve the same "shot".

Yep, totally.

> Perhaps it helps that the vehicle moves? That is, after all, very close to having the same scene photographed by cameras positioned at different distances.

I think you're right, they must be taking advantage of this to get the kind of results they are getting. That point cloud footage is impressive, it's hard to imagine getting that kind of detail and accuracy just from individual 2d stills.

Maybe this also gives some insight into the situations where the system seems to struggle. When moving forward in a straight line, objects in the peripheral will shift noticeably in relative size, position and orientation within the frame, whereas objects directly in front will only change in size, not position or orientation. You can see this effect just by moving your head back and forth.

So it might be that the net has less information to go on when considering objects stationary directly in or slightly adjacent to the vehicles path -- which seems to be one of the scenarios where it makes mistakes in the real world, e.g. with stationary emergency vehicles. I'm just speculating here though.

> Also, among the front-facing cameras, the two outermost are at least a few centimeters apart. I haven't measured it, but it looks like a distance not unlike between a human's eyes [0]. Maybe that's already enough?

Maybe. The distance between the cameras is pretty small from memory, less than in human eyes I would say. It would also only work over a smaller section of the forward view due to the difference in focal length between the cams. I can't help but think that if they really wanted to take advantage of binocular vision, they would have used more optimal hardware. So I guess that implies that the engineers are confident that what they have should be sufficient, one way or another.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: