>"just train it until it is perfect" Yes that's exactly the problem with current...

>"just train it until it is perfect"

Yes that's exactly the problem with current approach based on a "valuation" function.

They are not trying to aim for perfection, and therefore cannot make progress anymore.

To progress you must precisely define what is the frontier : an evaluation of 0.1 is not resolved to one of "white win", "draw", "white lose" which they theoretically must be. They are not "committing" to anything.

To train such a network to perfection you must avoid training your neural network for the "average" game state, but rather also train for "hard mining samples", game states which define the frontier.

Find a candidate, find a violation, add to dataset of training examples, Retrain to perfection on a growing dataset, (or a generator of hard positions) to find a new candidate and Loop.