I see this sentiment but I think there's a flawed, tacit assumption underlying it. Image classifiers might not be very good at calling out a presentation in a static photograph, but an AV doesn't need to classify objects to avoid striking them. For AV purposes, uncertainty about part of a scene is also a useful input.