Lack of focus is a major problem for companies and we all know that tech debt leads to increased bug counts.
Team focus on vision which is by far the highest accuracy and bandwidth sensor allows for a faster rate of safety innovation given a constant team size.
Tesla's cameras often get blocked by rain or blinded by the sun or not see that well in the dark. It's really hard to imagine those cameras replacing the ultrasonic sensors which do a pretty good job at telling you where you are when you're parking etc. I can't see how the camera is going to detect an object at pitch dark and estimate the distance to it better than an ultrasonic sensor. But hey, if people ding their cars it's more revenue.
The bottom line seems to be that the part shortages would have slowed production and cost cutting. The rest of the story seems like a fable to me. It was pretty clear Tesla removed the radar because it couldn't get enough radars.
The interview didn't really impress me. I'm sure Andrej is bound by NDA and not wanting to sour his relationship with Tesla/Elon but a lot of the answers were weak. (On Tesla and some of the other topics, like AGI).
One interesting side effect of only using visual sensors is that the failure modes will be more likely to resemble human ones. So people will say "yeah, I would have crashed in that situation too!". With ultrasonic and radar and ladar it may make far fewer mistakes but it is possible they might not be the same ones people make, so people will say "how did it mess that up?"
Sadly, that’s the worst way to actually design the system. I’d rather have two different technologies working together, with different failure modes. Not using radar (especially in cars that are already equipped) might make economic sense to Tesla, but I’d feel safer if visual processing was used WITH radar as opposed to instead of radar.
I also expect an automated system to be better than the poor human in the drivers seat.
You have to eventually decide to trust one or the other, in real-time. So having multiple failure modes doesn't solve the problem entirely. This is called 'Fusion', meaning you have to fuse information coming from multiple sensors together. There are trade offs because while you gain different views of the environment from different sensors, the fusion becomes more complicated and has to be sorted out in software reliably in real-time.
> There are trade offs because while you gain different views of the environment from different sensors, the fusion becomes more complicated and has to be sorted out in software reliably in real-time.
If you're against having multiple sensors though, the rational conclusion would be to just have one sensor, but Tesla would be the first to tell you that one of the advantages their cars have over human drivers is they have multiple cameras looking at the scene already.
You already have a sensor fusion problem. Certainly more sensors add some complexity to the problem. However, if you have one sensor that is uncertain about what it is seeing, having multiple other sensors, particularly ones with different modalities that might not have problems in the same circumstance, it sure makes it a lot easier to reliably get to a good answer in real-time. Sure, in unique circumstances, you could have increased confusion, but you're far more likely to have increased clarity.
This is one side of the argument. The other side of the argument is that what matters more than the raw sensor data is constructing an accurate representation of the actual 3D environment. So an argument could be made (which is what this guy and Tesla are gambling on and have designed the company around), is that the the construction & training of the Neural out-weighs the importance of the actual sensor inputs. In the sense that even with only two eyes (for example) this is enough when combined with the ability of the brain to infer the actual position and significance of real objects for successful navigation. So as a company with limited R&D & processing bandwidth, you might want to devote more resources to machine learning rather than sensor processing. I personally don't know what the answer is, just saying there is this view.
The whole point of the sensor data is to construct an accurate representation of the actual environment, so yes, if you can do that, you don't need any sensors at all. ;-)
Yes, in machine learning, pruning down to higher signal data is important, but good models are absolutely amazing at extracting meaningful information from noisy and diffuse data; it's highly unusual to find that you want to dismiss a whole domain of sensor data. In the cases where one might do that, it tends to be only AFTER achieving a successful model that you can be confident that is the right choice.
Tesla's goal is self-driving that consumers can afford, and I think in that sense they may well be making the right trade-offs, because a full sensor package would substantially add to the costs of a car. Even if you get it working, most people wouldn't be able to afford it, which means they're no closer to their goal.
However, I think for the rest of the world, the priority is something that is deemed "safe enough", and in that sense, it seems very unlikely (more specifically, we're lacking the tell tale evidence you'd want) that we're at all close to the point where you wouldn't be safer if you had a better sensor package. That means, in effect, they're effective sacrificing lives (both in terms of risk and time) in order to cut costs. Generally when companies do that, it ends in law suits.
> You have to eventually decide to trust one or the other, in real-time.
More or less. You can take that decision on other grounds - e.g. "what would be safest to do if one of them is wrong and i don't know which one?"
The system is not making a choice between two sensors, but determining a way to act given unreliable/contradictory information. If both sensors allow for going to the emergency lane and stopping, maybe that's the best thing to do.
It's far from the worst way, because if humans are visually blinded by the sun or snow or rain they will generally slowdown and expect the cars around them to do the same.
Predictability especially around failure cases is a very important feature. Most human drivers have no idea about the failure modes of lidar/radar.
A car typically doesn't have lights shining in all directions. My Tesla doesn't at an rate. At night, backing into my driveway, I can barely see anything on the back-up camera unless the brake lights come on. If it's raining heavily it's much worse. But the ultrasonic sensors are really good at detecting obstacles pretty much all around.
Interesting. I find the rear camera in my Tesla is outright amazing in the dark. I can see objects so much more clearly with it than with the rear view mirror. It feels like I'm cheating... almost driving in the day.
Reverse lights are literally mandated by law. Your Tesla has them, and if they're not bright enough that's a fairly cheap and easy problem to fix relative to the alternatives.
The sensors also detect obstacles on the side of the car where there's no lighting. Every problem has some sort of solution, but removing the ultrasonic sensors on the Tesla is going to result in poorer obstacle detection performance. Sure, if they add 360 lighting and more cameras they can make up for that.
EDIT: Also I'm not quite positive why the image is so dark when I reverse at night. But it still is. The slope and surface of the driveway might have something to do with that... Still I wouldn't trust that camera. The ultrasonic sensors otoh seem to do a pretty good job. That's just my experience.
EDIT2: I love the Tesla btw. The ultrasonic sensors seem to work pretty reliably, they're pretty much their own system, the argument about complexity doesn't really seem to hold water and on the face of it the cameras won't easily replace them...
You are greatly overestimating the functionality of the sensors, and underestimating the importance of the rest of the system. Sensors are important, but the majority of the work, effort and expense is involved with post-sensor processing. You can't just bolt a 'Lidar' on to the car and improve quality of results. Andrej and other engineers working on these problems are telling everyone the same story. The perfect solution is not obvious to anyone, and they have chosen one path. Engineers aren't trying to scam people out of a few dollars so they can weasel out of making high quality technology. This has Nothing to do with cost-cutting.
"The perfect solution is not obvious to anyone, and they have chosen one path. Engineers aren't trying to scam people out of a few dollars so they can weasel out of making high quality technology. This has Nothing to do with cost-cutting."
Lidar vs. Stereo camera vs. multiple cameras vs. ultrasound is a separate problem that engineers are trying to solve, not how can we sell cheaper mops. The decision to not use Lidar, as he says, and is the common debate being explored by people working on autonomous driving is whether it makes more sense to focus on stereo image sensors with highly integrated machine learning, or maybe use Lidar or other sensors and include data Fusion processing. Both methods have trade-offs.
"Lidar vs. Stereo camera vs. multiple cameras vs. ultrasound is a separate problem that engineers are trying to solve, not how can we sell cheaper fucking mops."
Okay? Tesla is a car company and they are absolutely trying to sell a cheaper car. That's obvious to anyone that's been in one.
"Both methods have trade-offs."
Right, isn't that why most other systems use both?
Both methods have trade-offs as in there are positive and negative merits for both approaches. Using both systems requires the sensor data to be fused together to make real-time decisions. This is the whole point, why people are trivializing this problem, and why it is easy to believe that they are just trying to scam people by going cheap on using multiple sensors. If you want to argue that it is better to use Lidar then explain why apart from 'others do it'. The podcast, and previous explanations by this guy and others that agree with him (which occurred way before some shortage issues) is about what is the best way to solve autonomous driving. You don't solve it by simply adding more sensors. There are multiple hours of technical information about why this guy Andrej thinks this way is best. Others make arguments for why multiple sensors and fusion makes more sense. No one knows the correct answer, it will be played out in the future. Maybe what some people care about is cheaper cars. That is not what the podcast was about, that is not how the Lidar + stereo camera vs. stereo-camera only decision was made. And in terms of the advancement of human civilization it is not interesting to me whether Tesla has good or bad quarterly results compared to what is the best way to solve the engineering problems & the advancement of AI, etc. I don't really care very much but it is slightly offensive when many people just dismiss engineers who are putting in tons of effort to legitimately solve complicated problems as if they are just scam artists trying to lie to make quick money. That is also a stupid argument. No company is going to invest billions(?) of dollars and tons of engineering hours into an idea they secretly know is inferior and will eventually lose out because they can have a good quarter. That is not a serious argument.
I am an engineer working on autonomous vehicles. Nothing personal just responding to the thread as a whole. I don't believe this guy is conspiring to trick anyone. Business decisions, or course. I think they are in good faith gambling on this one approach. So I am interested to see if their idea will win, or if someone else figures out a better way.
There problem is not that he was wrong, the problem is that he's made a motherhood statement in response to a very specific question.
He's not conspiring to trick people per se but he's also not being super clear. His position obviously makes it difficult to answer this question. It's possible he really believes this is better but if he didn't he wouldn't exactly tell us something that makes him and his previous employer look bad. Also his belief here may or may not be correct.
Is it a coincidence that the technical stance changed at the same time when part shortages meant that cars could not be built and shipped because of shortages of radars?
More likely there was some brainstorming as a result of the shortages and the decision was made at that point to pursue an idea of removing the additional sensors and shipping vehicles without those. This external constraint makes believing the claims that this is actually all around better, while hearing some reports of increases in ghost braking (anecdotes) a little difficult. Not clear if there was enough data at that time to prove this and even Andrej himself sort of acknowledges that it's worse by some small delta (but has other advantages, well shipping cars comes to mind).
So yes, sensors have to be fused, it's complicated, it's not clear what the best combination of sensors is, the software might be larger with more moving parts, the ML model might not fit, a larger team is hard to manager, entropy - whatever. Still seems suspicious. Not sure what Tesla can do at this point to erase that, they can say whatever they want, we have no way of validating that.
Maybe you're right, I don't care about Tesla drama.
Here is one possible perspective from an engineering standpoint:
Same amount of $$, same amount of software complexity, same size of engineering teams, same amount of engineering hours, same amount of moving parts. One company focuses on multiple different sensors and complex fusion with some reliance on AI. Another company focuses on limited sensors and more reliance on AI. Which is better? I don't think the answer is clear.
The other point is that I am arguing that many people are over-stating the importance of the sensors. They are important, but far more important is the post-processing. Any raw sensor data is a poor actual representation of the real environment. It is not about the sensors, but about everything else. The brain or the post-sensor processing is responsible for reconstructing an approximation of the environment. We have to infer from previous learned experiences of the 3D world to successfully navigate. There is no 3D information coming in from sensors, no objects, no motion, no corners, no shadows, no faces, etc. That is all constructed later. So whoever does a better job at the post-processing will probably out perform regardless of the choice of sensors.
People absolutely get that. Their issue is that Tesla is only relying on visual data and then on what is a disingenuous basis, insist that this is okay because humans "only need eyes" or some other similar sort of strawman argument.
Okay so they are "good faith" gambling? I don't want to drive in a car that has any gambling... I don't get how it being in good faith (generous on your part) makes it less of a gamble?
Uhh highest accuracy and bandwidth for what? You can have a camera that can see piece of steak at 100K resolution at 1000 FPS but doesn’t mean you can use a camera to replace a thermometer. Blows my mind how people eat up that cameras can replace every sensor in existence without even entertaining basic physics. ML is not omnipotent.
For the specific task of (for example) cooking a steak it’s not hard to envision a computer vision algorithm coupled with a model with a some basic knowledge of the system (ambient temperature, oven/stove temperature, time cooking, etc.) doing an excellent job.
No, I can't envision this. Surface texture alone will not tell you if meat is cooked. There is no getting around the temperature probe.
Now, simple color matching models are used in some fancy toasters on white bread to determine brownness. That's the most I've ever seen in appliances...
I don't think it was your intent, but your statement makes it seems like all Tesla engineers are looking at Twitter code. I bet this number is closer to 4.
Tesla has ca. 1000 software engineers working in various capacities. The ca. 300 that work on car firmware and autonomous driving are probably not participating in the Twitter drama.
I don't think the goal is to review all Twitter source. That should be the job of the (new?) development team. I think the goal was to look at the last 6 months of code, especially the last few weeks, for anything devious.
> "Team focus on vision which is by far the highest accuracy and bandwidth sensor allows for a faster rate of safety innovation given a constant team size."
By hiding the ball that you are starting from a much more unsafe position
> vision which is by far the highest accuracy and babdwidth
They are literally the least accurate of all sensors.
Radar tells you distance and velocity of each object. Lidar tells you size and distance of each object. Ultrasonic tells you distance. Cameras? They tell you nothing!
Everything has to be inferred. Have you tried image recognition algorythms? I can recognise a dog from 6 pixels, the image recognition needs hundreds, and has colossal failures.
We have no grip on the results AI will produce and no grasp on it's spectacular failures.
Team focus on vision which is by far the highest accuracy and bandwidth sensor allows for a faster rate of safety innovation given a constant team size.