I have seen nothing that indicates that Mercedes stands a chance to catch up with Tesla.
All I have seen Mercedes do could have been accomplished by manual programming. But manual programming does not scale. Look at the progress Tesla made from FSD 11 to FSD 12. Nothing indicates that Mercedes is capable of this type of progress.
I believe Mercedes takes full responsibility for its self driving cars while Tesla will still disengage autopilot like 1 second before impact and claim it's the drivers fault? Essentially, drivers legally have to be on full alert at all times in a Tesla while they can kick back and watch a movie during the ride in a Mercedes.
Right, even when FSD plows right into a solid object Tesla will try to pin 100% of the blame on the human driver. On top of all of the other reasons to not take Tesla's marketing BS seriously, their total lack of ownership of their own autonomous system should prevent anyone from taking their claims seriously.
That and the fact that Elon has spent the last decade saying they're 6-12 months from actual full self driving.
Yes, now it will no longer kill 40% of school children crossing the road, but only 39%.
LOOK AT THE PROGRESS!!!1
And with FSD 34 it will be able to take the easiest highway exit ramp there is on this planet, Mountain View on HWY101, without killing you.
And what does "manual programming" even mean? Do you REALLY believe the xbox360-grade hardware in the tesla is doing AI inferencing? It's not. It's a heuristic.
They are doing inferencing on the vehicle for lane keeping, traffic sign detection, emergency braking, etc.
The biggest problem is really how do you get to 10^n miles per disengagement, for n>=5. Waymo is kinda getting there, Tesla isn't anywhere near that today.
Getting there is really hard, because that's when you get all of the long tail events like bears, moose, wild turkeys, horse mounted police officers, costume conventions, pickup trucks carrying traffic cones and road signs, flooded streets, construction pilot cars, vehicles driving the wrong way on the highway, downed electric poles, NYC steam plumes, and tons of other scenarios. Highway driving in nice and sunny conditions is easy compared to that.
I am not sure how much really is done via inferencing, if at all. Just the way how "Tesla Vision" behaves in a parking garage does simply not look like what I would expect to come out of inferencing. It looks very, very, very much like a pretty bad heuristic. Just look what it makes out of blind spots, the parts the cameras can't see. There is absolutely nothing like "according to my model, there should be X on this spot". The same goes for their distancing sensing in these situations. "Oh, there is a pipe on that wall, which likely has difference distance to me than the wall. I might not wanna crash into that" is trivial on a level that nobody would even use that as a Captcha these days. A model that does not "know" what the third dimension is?
Do you know of any reverse engineering that proves that there really is running anything in regards of inferencing on the NPUs?
Also, just as you said - there are tons of corner cases in the real world, especially once you aren't on a 10-lane US highway which has been designed for monster trucks driven by 16 year olds (no offence) but one of the roundabouts of hell in Paris.
Where would the training data been coming from?
So, I have my doubts.
During summer, there is a red flower growing near the entrance of my parking garage. It constantly is seen as a red light, and the entrance of my garage is often mistaken for a huge truck suddenly magically appearing. Again: Nobody would use a Captcha these days: "Is this a red flower or a traffic light?".
Again, smells like heuristic. "Amount of red pixels in a certain form and spot".
Typically, inference in a machine learning context means feeding a model some input and looking at its output. I'm pretty sure that they are running some model on the vehicle that takes pixels as input and says this part of the image is a car/truck/traffic sign/lane line/etc. It might be misclassifying things (eg. the flower as a red light), but would still be running some kind of model.
As you point out though, the model only seems to do some simple object detection, but doesn't have much of an understanding of what it sees (eg. does it make sense that there would be a traffic light at this location). There are plenty of videos of it getting confused by all kinds of situations (eg this one from a few years ago https://www.businessinsider.com/tesla-fsd-full-self-driving-... ).
All I have seen Mercedes do could have been accomplished by manual programming. But manual programming does not scale. Look at the progress Tesla made from FSD 11 to FSD 12. Nothing indicates that Mercedes is capable of this type of progress.