Yes that's exactly the problem with current approach based on a "valuation" function.
They are not trying to aim for perfection, and therefore cannot make progress anymore.
To progress you must precisely define what is the frontier : an evaluation of 0.1 is not resolved to one of "white win", "draw", "white lose" which they theoretically must be. They are not "committing" to anything.
To train such a network to perfection you must avoid training your neural network for the "average" game state, but rather also train for "hard mining samples", game states which define the frontier.
Find a candidate, find a violation, add to dataset of training examples, Retrain to perfection on a growing dataset, (or a generator of hard positions) to find a new candidate and Loop.
So what makes you think it is possible to precisely define such a frontier? And why should such a thing, if it is possible at all, be 1. doable by humans and 2. doable with the amount of energy and computing power available to us within the coming couple of decades?
Yes that's exactly the problem with current approach based on a "valuation" function.
They are not trying to aim for perfection, and therefore cannot make progress anymore.
To progress you must precisely define what is the frontier : an evaluation of 0.1 is not resolved to one of "white win", "draw", "white lose" which they theoretically must be. They are not "committing" to anything.
To train such a network to perfection you must avoid training your neural network for the "average" game state, but rather also train for "hard mining samples", game states which define the frontier.
Find a candidate, find a violation, add to dataset of training examples, Retrain to perfection on a growing dataset, (or a generator of hard positions) to find a new candidate and Loop.