I know nothing about LLM training, but do you mean there is a solution to the issue of LLMs gaslighting each other? Sure this is a proven way of getting training data, but you can not get theorems and axioms right by generating different versions of them.
In their approach, the LLM generates inputs (images to be transformed) and solutions (Python programs that do the image transformations). The output images are created by applying the programs to the inputs.
So there's a constraint on the synthetic data here that keeps it honest -- the Python interpreter.