Pretty interesting. Mr. Ng claims that for some applications having a small set of quality data can be as good as using huge set of noisy data.
I wonder if, assuming the data is of highest quality, with minimal noise, having more data will matter for training or not. And if it matters, on what degree?
In general you want to add more variants of data but not so much that the network doesn't get trained by them. Typical practice is to find images whose inclusion causes high variation in final accuracy (under k-fold validation, aka removing/adding the image causes a big difference) and prefer more of those.
Now, why not simply add everything? Well in general it takes too long to train.
> Typical practice is to find images whose inclusion causes high variation in final accuracy (under k-fold validation, aka removing/adding the image causes a big difference)
How do you identify these images? It sounds like I'd need to build small models to see the variance but I'm hoping that there's a more scientific way?
I wonder if, assuming the data is of highest quality, with minimal noise, having more data will matter for training or not. And if it matters, on what degree?