For real. Editing prompts bares no resemblance to engineering at all, there is n...

echelon · 2025-06-04T20:52:02 1749070322

> Will your change to the prompt make the benchmark go up? Down? Why? Can you predict? No, it is not a science at all.

Many prompt engineers do measure and quantitatively compare.

morkalork · 2025-06-04T21:05:25 1749071125

Me too but it's after the fact. I make a change then measure, if it doesn't I roll back. But it's as good as witch craft or alchemy. Will I get I get gold with this adjustment? Nope, still lead. Tries variation #243 next

a2dam · 2025-06-04T21:28:30 1749072510

This is literally how the light bulb filament was discovered.

MegaButts · 2025-06-04T21:53:53 1749074033

And Tesla famously described Edison as an idiot for this very reason. Then Tesla revolutionized the way we use electricity while Edison was busy killing elephants.

SrslyJosh · 2025-06-05T17:01:52 1749142912

Lots of things have been discovered by guess-and-check, but it's a shit method for problem solving. You wouldn't use guess-and-check unless A) you don't know enough about the problem to employ a better method or B) the problem space can be searched quickly enough that using a better method doesn't really matter.

yowlingcat · 2025-06-06T05:08:09 1749186489

I think that the only reason guess-and-check fails is when you don't know enough to actually check.

Maybe this is why we disagreed on a previous comment. I think guess and check is generally the best way to solve problems, so long as your checks well-designed (which to be fair, does require understanding the problem) and you incorporate the results from the check back into further guesses, and you can afford to brute force it (which is statistically the most common problems big and small I've had to solve).

In a lot of ways, there's nothing fancy about that, it's just the scientific method -- the whole point which is that you don't really need to "know" about the problem a priori to get to the truth, and that you can recompile all dependent knowledge from scratch through experimentation to get to it.

In practice it feels really expensive but it's also where the best insights come from -- experimentally re-deriving common knowledge often exposes where its fractures and exceptions are in clear enough detail to translate it into novel discovery.

yawnxyz · 2025-06-05T14:42:06 1749134526

updating benchmarks and evals is something closer to test engineering / qa engineering work though