Love that thread. The top comment is excellent: > Like picking hyperparamters - ...

sjwhevvvvvsj · on Jan 23, 2024

I do a lot of model tuning and I’m almost ashamed to say I tell GPT what performance I’m aiming for and have it generate the hyper parameters (as in just literally give me a code block). Then I see what works, tell GPT, and try again.

I’m deeply uncomfortable with such a method…but my models perform quite well. Note I spend a TON of time generating the right training data, so it’s not random.

danielmarkbruce · on Jan 23, 2024

1/8th (soon to be 1/2) of the working world:

"I do a lot of X and I'm almost ashamed to say I tell GPT Y then I see if it works and try again".

sjwhevvvvvsj · on Jan 24, 2024

Well, I do want to know more about how it works. Anything important I will teach myself, it’s just hard to justify the time investment during work hours when the robot does it. Which I think is also important: these tools save time, but with downsides.

logtempo · on Jan 23, 2024

> Sometimes deeper understanding provides deeper insights to the problems at hand. But often, they don't, even when the deeper understanding is beautiful. If the hammer works when you hold it a certain way, that's great, but understanding all possible ways to hold a hammer doesn't always help get the nail in better.

Is it true? I mean, in mathematics having a proof of something is way stronger than having a conjecture. And in engineering, proving that your solution is optimal is way stronger than saying "hey look, I tried many things and finally it works!".

Worse, in statistics if you throw a bunch of tests and pick the one that "works" you might have false conclusions all the time. And AI is statistics.

Sure it works to test out 10 datatset and whatever number of different machine learning, but it takes time and money and might be suboptimal from an engineering POV.

ummonk · on Jan 23, 2024

> This embodies the very fundamental difference between science and engineering.

Not really though. In engineering, you have heuristics, even if you don't know why they work. In the case of deep learning / AI, there seems to be very little in the way of built up heuristic knowledge - it's just "try stuff and see what works for every problem".

robotresearcher · on Jan 24, 2024

All the model topologies that have names are heuristics. The idea of a 'layer' is a heuristic. And so on.

You don't really just try stuff. You choose very few things to try from the space of models. And you choose craftily.

We have quite a lot of domain craftiness now, if you think about it that way.

liuxiansheng · on Jan 23, 2024

I think if this was truly the case then wouldn't ML algorithm development be a solved problem with AutoML? I don't think AutoML is close to ubiquitous which means there must still be value in heuristics and a deeper understanding of our tools.

epistasis · on Jan 23, 2024

I think that's also the difference between science and engineering: has the tool/technology been around enough to learn heuristics, or is everything still in the "fuck around and find out" phase?

amelius · on Jan 23, 2024

The hammer analogy doesn't make much sense because for a hammer we can actually use our scientific knowledge to compute the best possible way to hold the tool, and we can make instruments that are better than hammers, like pneumatic hammers, pile drivers, etc.

With your argument, we would be stuck with the good old, but basic hammer for the rest of time.

epistasis · on Jan 23, 2024

That seems like a different analogy; making better hammers is a different thing than understanding why holding a hammer a certain way works well. We did eventually invent enough physics to understand why we hold hammers where we do, but we got really far just experimenting without first principles. And even if we use first principles, we are going to discover a lot more by actually using the modified hand-held hammer and testing it, than necessarily hitting it out of the park with great physical modeling of the hammer and the biomechanics of the human body.

And in any case, I'm not saying we shouldn't search for deep understanding of what hyperparameters work on a first try, I'm just saying there's a good chance that even if the principles are fully discovered, it may be that calculating using those principles is more expensive than a bunch of experimentation and won't matter in the end.

That's the trick about science, it's more about finding the right question to answer than how to find answers, and often times the best questions only become apparent afterwards.

logiduck · on Jan 23, 2024

Yes, this makes it very difficult to apply ML and RL in non-simulated scenarios.

With simulated scenarios you can just replay and "sweep" across hyperparameters to find the best one.

In a realworld scenario with limited information, fine tuning hyperparameters is much harder as you quickly find yourself in local maxima.