But while keeping the data constant, adding more and more parameters is a strate...

But while keeping the data constant, adding more and more parameters is a strategy that works, so what gives? Are the functions getting somehow regularized during training so effectively you could get away with fewer parameters, it's just that we don't have the right model just yet?