Large-scale vehicle classification

Tanoc · on Jan 1, 2023

>I was surprised to find that around 20% of listings were over $50k and around 6% were over $70k. I expected these high prices to be more rare.

This seems to be because you scraped new car prices instead of used. The used market has a wild variance in price data, even if viewing only CPO vehicles directly from dealerships. That in itself could be a whole other fascinating week spent training it on that data. It could also be that Kelley Blue Book's data is wrong as always. KBB pumps up the prices on used cars well beyond their actual market value, and I've yet to figure out what benefits they receive from doing so. It might be used dealerships gaming things. A Chevrolet Cobalt regardless of year or location sells for about $3,500 from private sellers, but KBB says they go for $6,600, or almost twice that.

I'd also be interested in seeing market location weighting for the data. I expect vehicles to be more expensive in places like California and New York compared to Florida or Oklahoma for example.

>After some Googling, I found out that this was a limitation of EXT4 when creating directories with millions of files in them.

I've experienced this myself with faulty Flatpaks spamming my drives with log files. And also Blender, interestingly. If you split a frame into six layers (AO, diffuse, glossy, alpha, shadows, Z-depth) and the animation is about sixty six thousand frames, that's three hundred and ninety six thousand images for one forty five minute animation. Often times you split the scene into three pieces; foreground, focus, and background, which triples that output to a million a hundred and eighty eight thousand images. When put into separate folders to separate each scene or shot, and with multiple renders to test or be sent for approval, it's very easy to run into the EXT4 hash table limitation for just three or four animations over twenty minutes.

unixpickle · on Jan 1, 2023

OP here. Interesting, I had not researched the used car market enough to know about the price inflation. Can you think of any other sites that have more realistic prices?

Tanoc · on Jan 1, 2023

From experience, no online used marketplace seems to be immune from price manipulation or inflation. KBB is used as a reference tool to find a starting point for vehicle prices, but the only way to really know how much they cost is going through multiple websites to see. I'd suggest scraping data from Cars.com, eBay Motors, Craigslist, Facebook Marketplace, Autolist, and Cargurus.

For a starting point I'd have it look at vehicles from 2012 or newer. This should prevent "classic" or newly collectible vehicles from spiking the data. Secondly, be aware that Craigslist prices on the front of the ad are often listed as fake numbers such as "0$" or "12345$" with the actual soft or hard price contained within the ad's body of text. Facebook Marketplace can be much worse about this, and much more manipulative with the number of fake listings, so check for duplicates or suspiciously low prices to prevent spiking the data. eBay Motors has a quirk shared with eBay, where they have both an auction price and a Buy It Now price. The only reliable way to gauge prices there would be using Buy It Now listings.

version_five · on Dec 31, 2022

This was very well done and comprehensive. I liked the analysis of when it works well and when it doesn't, and the testing with nonsense images. We need more of this kind of rigor in ML.

ComputerGuru · on Jan 1, 2023

I noticed in the first set of screenshots/tests (the photo roll) the predicted price was nowhere near the predicted price bucket (eg the predicted price was in the second or third highest confidence group). I expected the post to go into that but was a bit disappointed.

If op is here, I would love to hear your thoughts - and which was more accurate.

(Also I could be wrong but the Audi is an A5/S5 and the model should be able to predict it’s not a A4/S4 quite easily from the rear view?)

unixpickle · on Jan 1, 2023

OP here. Yeah, it's quite odd that the median price prediction doesn't fall into the most likely bucket, even when that bucket has more than a 50% confidence. I'm not sure how to interpret this. One theory: maybe the median prediction head hasn't converged yet and naturally has a bias towards lower values due to the network's initialization.

I'll know more in a week or two once all the models have converged, and can update here!

dennisnghouse · on Jan 2, 2023

So cool that you’ve put this together. As a big car nut my whole life, I’ve always wanted something like this.

A possible commercial application is: evaluating shopping mall customers demographic by parking lot. Real estate market shifts (possible hypothesis is vehicles values go up before commercial or residential property values). At which point, you would need to include used cars in the database.

Second, I realize make and model are the best indicators, but as interested in vehicle design and history, is there anything about the innate design or shape that looks “expensive” or looks “modern”/current. Are there patterns here that without knowing the badge? Related, do the new BMW’s look any more expensive than a Genesis if you didn’t know it was a BMW? If there was a way to remove model/make from the model, what would it say as brands that look more expensive than they are and vice versa? Which brands’ design language looks old?

HellsMaddy · on Jan 1, 2023

> In some of my earlier examples, the model correctly predicts the make/model of cars that are only partially visible in the photo. To study how far this can be pushed, I panned a crop along a side-view of two different cars to see how the model’s predictions changed as different parts of the car became visible. In these two examples, the model was most accurate when viewing the front or back of the car, but not when only the middle of the car was visible. Perhaps this is a shortcoming of the model, or perhaps automakers customize the front and back shapes of their car more than the sides. I’d be happy to hear other hypotheses as well!

I would hypothesize this is an artifact of the training data being photos taken for the purpose of selling the cars.

As a seller, it doesn’t make a lot of sense to take a photo of the middle of the car, unless a) you’re trying to hide something or b) you’re being lazy (implying the car isn’t worth much of your time). I would imagine that better photography in general - composition, framing, lighting, setting - would be correlated with higher car prices.

unixpickle · on Jan 1, 2023

These are good ideas! It should be possible to test how photographic quality correlates with price, though one complication is that I train with data augmentation (a commonly used technique to make better use of the training data). Some of the augmentations such as random cropping and color jitter might already stimulate some level of bad photography.

blamazon · on Jan 1, 2023

I was satisfied to see the top three longtime best selling vehicles in North America as the three most common vehicles in the dataset:

1. Ford F-150

2. Chevrolet Silverado

3. Ram 1500

It is a bit mindblowing to me that these are so popular. Smart of their manufacturers to evolve them into luxury vehicles with thick profit margins.

zone411 · on Jan 1, 2023

Nice write-up. From my experience, the finding that "predicting additional details about each car (e.g. make and model) improved the accuracy of price predictions" is indeed surprising and possibly a red flag that something is not quite right. An adequate model should be able to figure out these intermediate features by itself if they're truly helpful.

version_five · on Jan 1, 2023

Really? It's in line with my experience. I'd think that making the model predict more stuff acts as a kind of regularization that forces the model to better focus on real predictive features and not memorize some shortcut. I've generally found CV models to be bad at picking the "right" features (those that generalize to new examples) during training, and making them predict more stuff is good way of helping them along

unixpickle · on Jan 1, 2023

It's definitely not just a regularizer in my case, because the gap appears even before a single epoch. The gap does also appear for two very different model architectures.

One explanation is that price labels are super noisy. If there is enough noise in the primary labels, you could imagine that adding in the more predictable target variables could help reduce gradient noise and speed up training. That's my current hypothesis, but I'm very open to others. If I had more time I'd try to do more experiments on this.

version_five · on Jan 1, 2023

That's very interesting. Do the train and val set losses both show that behavior? I did a very similar experiment earlier this year - in my case it was a classifier where images could be categorized different ways, and my takeaway was making it predict more classes improved performance. I'll have to go back and look at the loss curves during training and see if the improvement is immediate as in your case

unixpickle · on Jan 1, 2023

Before one epoch, both the train and eval curves look pretty much identical. Quite curious

zone411 · on Jan 1, 2023

I've seen that happen before, but it was always a temporary artifact of the chosen architecture, optimizer, loss function, or training details that would disappear later on once they were improved and then offer no help. I have little experience with CV compared to other tasks though, so this might be the reason. Have you seen any papers about this phenomenon?

version_five · on Jan 1, 2023

I don't know that I've seen a paper that says this explicitly, I'm just saying it based on experience. You may have seen papers where they obscure parts of the image and find that the classifier is basing its decision on the background, I think that's well known. Likewise I can't cite a paper of the top of my head, but the usual augmentations like random cropping, flipping, color jitter, etc are pretty well known to practitioners as way of preventing oveffitting. I see additional prediction targets as an extension of that, because they likewise incentivise the model to learn the "right" features because they make it harder to latch on to some spurious pattern in the data. And I've had success with it practically, which is why I made my original comment. Ymmv of course

zone411 · on Jan 1, 2023

I haven't seen a paper on this either, but I think that if it does work, somebody should definitely write one about it because it seems quite consequential. I am not sure if the underlying reason would be regularization though, seems more like a case of indirect feature engineering.

version_five · on Jan 1, 2023

I'd argue that both are the same thing, but yes I agree - one way or another it's a way of bringing additional domain knowledge to the problem - and I suppose it entails the downside that if you do it wrong, you can reduce predictive power

unixpickle · on Jan 1, 2023

Before writing this post, I asked chatgpt for examples of positive transfer from auxiliary losses in the literature. It pointed me to this paper:

https://arxiv.org/abs/1705.07115

jeffrallen · on Jan 1, 2023

I liked this article because it showed that ML classification is not magic, but a series of design choices from a well known (to the author) design space. And yet, he found surprising things about the emergent behavior of the classifiers even in this restricted domain.

alsodumb · on Dec 31, 2022

Fun read! I’m curious about the part where you discussed changing surroundings while keeping the car same - are KBB images user uploaded or taking in a garage/controlled setting like Carvana?

nwoli · on Jan 1, 2023

He used dalle inpainting (quite clever)

tourist_on_road · on Jan 1, 2023

Nice post. Instead of using inpainting for model explainability, why not use attention maps to understand where the model is focusing to get a particular prediction?

jl6 · on Dec 31, 2022

Not really related, but happy new year everyone: http://www.merzo.net/indexSD.html