My own version took a first pass with an LLM whose job was to assign a 0-100 complexity rating, and then there was more or less a linear scaling of the allocated thinking budget.
The OP effort here is obviously higher grade, and I'm really tickled to see quantitative results. Well done.
https://github.com/NiloCK/autothink
https://www.paritybits.me/think-toggles-are-dumb/
My own version took a first pass with an LLM whose job was to assign a 0-100 complexity rating, and then there was more or less a linear scaling of the allocated thinking budget.
The OP effort here is obviously higher grade, and I'm really tickled to see quantitative results. Well done.