> Like picking hyperparamters - time and time again I've asked experts/trainers/colleagues: "How do I know what type of model to use? How many layers? How many nodes per layer? Dropout or not?" etc etc And the answer is always along the lines of "just try a load of stuff and pick the one that works best".
> To me, that feels weird and worrying. It's like we don't yet understand ML properly yet to definitively say, for a given data set, what sort of model we'll need.
This embodies the very fundamental difference between science and engineering. With science, you make a discovery, but rarely do we ask "what was the magical combination that let me find the needle in the haystack today?" We instead just pass on the needle and show everyone we found it.
Should we work on finding out the magic behind hyperparameters? In bioinformatics, the brilliant mathematician Lior Pachter once attacked the problem of sequence alignment using the tools of tropical algebra: what parameters to the alignment algorithms resulted in which regimes of solutions? It was beautiful. It was great to understand. But I'm not sure if it even ever got published (though it likely did). Having reasonable parameters is more important than understanding how to pick them from first principles, because even if you know all the possible output regimes for different segments of the hyper parameter space, really the only thing we care about is getting a functionally trained model at the end.
Sometimes deeper understanding provides deeper insights to the problems at hand. But often, they don't, even when the deeper understanding is beautiful. If the hammer works when you hold it a certain way, that's great, but understanding all possible ways to hold a hammer doesn't always help get the nail in better.
I do a lot of model tuning and I’m almost ashamed to say I tell GPT what performance I’m aiming for and have it generate the hyper parameters (as in just literally give me a code block). Then I see what works, tell GPT, and try again.
I’m deeply uncomfortable with such a method…but my models perform quite well. Note I spend a TON of time generating the right training data, so it’s not random.
Well, I do want to know more about how it works. Anything important I will teach myself, it’s just hard to justify the time investment during work hours when the robot does it. Which I think is also important: these tools save time, but with downsides.
> Sometimes deeper understanding provides deeper insights to the problems at hand. But often, they don't, even when the deeper understanding is beautiful. If the hammer works when you hold it a certain way, that's great, but understanding all possible ways to hold a hammer doesn't always help get the nail in better.
Is it true? I mean, in mathematics having a proof of something is way stronger than having a conjecture. And in engineering, proving that your solution is optimal is way stronger than saying "hey look, I tried many things and finally it works!".
Worse, in statistics if you throw a bunch of tests and pick the one that "works" you might have false conclusions all the time. And AI is statistics.
Sure it works to test out 10 datatset and whatever number of different machine learning, but it takes time and money and might be suboptimal from an engineering POV.
> This embodies the very fundamental difference between science and engineering.
Not really though. In engineering, you have heuristics, even if you don't know why they work. In the case of deep learning / AI, there seems to be very little in the way of built up heuristic knowledge - it's just "try stuff and see what works for every problem".
I think if this was truly the case then wouldn't ML algorithm development be a solved problem with AutoML? I don't think AutoML is close to ubiquitous which means there must still be value in heuristics and a deeper understanding of our tools.
I think that's also the difference between science and engineering: has the tool/technology been around enough to learn heuristics, or is everything still in the "fuck around and find out" phase?
The hammer analogy doesn't make much sense because for a hammer we can actually use our scientific knowledge to compute the best possible way to hold the tool, and we can make instruments that are better than hammers, like pneumatic hammers, pile drivers, etc.
With your argument, we would be stuck with the good old, but basic hammer for the rest of time.
That seems like a different analogy; making better hammers is a different thing than understanding why holding a hammer a certain way works well. We did eventually invent enough physics to understand why we hold hammers where we do, but we got really far just experimenting without first principles. And even if we use first principles, we are going to discover a lot more by actually using the modified hand-held hammer and testing it, than necessarily hitting it out of the park with great physical modeling of the hammer and the biomechanics of the human body.
And in any case, I'm not saying we shouldn't search for deep understanding of what hyperparameters work on a first try, I'm just saying there's a good chance that even if the principles are fully discovered, it may be that calculating using those principles is more expensive than a bunch of experimentation and won't matter in the end.
That's the trick about science, it's more about finding the right question to answer than how to find answers, and often times the best questions only become apparent afterwards.
> Like picking hyperparamters - time and time again I've asked experts/trainers/colleagues: "How do I know what type of model to use? How many layers? How many nodes per layer? Dropout or not?" etc etc And the answer is always along the lines of "just try a load of stuff and pick the one that works best".
> To me, that feels weird and worrying. It's like we don't yet understand ML properly yet to definitively say, for a given data set, what sort of model we'll need.
This embodies the very fundamental difference between science and engineering. With science, you make a discovery, but rarely do we ask "what was the magical combination that let me find the needle in the haystack today?" We instead just pass on the needle and show everyone we found it.
Should we work on finding out the magic behind hyperparameters? In bioinformatics, the brilliant mathematician Lior Pachter once attacked the problem of sequence alignment using the tools of tropical algebra: what parameters to the alignment algorithms resulted in which regimes of solutions? It was beautiful. It was great to understand. But I'm not sure if it even ever got published (though it likely did). Having reasonable parameters is more important than understanding how to pick them from first principles, because even if you know all the possible output regimes for different segments of the hyper parameter space, really the only thing we care about is getting a functionally trained model at the end.
Sometimes deeper understanding provides deeper insights to the problems at hand. But often, they don't, even when the deeper understanding is beautiful. If the hammer works when you hold it a certain way, that's great, but understanding all possible ways to hold a hammer doesn't always help get the nail in better.