What is the baseline performance of the LLM in solving those programming tasks? And did you test the performance of the students in the Codex group at the end of the course without allowing them to use Codex? Essentially I'm asking how can you conclude that these students didn't just learn to call a LLM, but actually learned to code independently?
https://austinhenley.com/blog/learningwithai.html