Interesting, but it sort of concluded with, essentially, “GPT3.5 is materially worse than GPT4,” which is a bit of a letdown as another conclusion could have been “and I had a grad student feed the same questions into GPT4 to compare.” Which I’ll be doing later today :-) I’ve seen enough of my own comparisons to guess the outcomes but it’s a storied man and his prompts, so it’s worth seeing the outcome.