The way the brain does it is by giving users a largely untrained model that they themselves have to train over the next 20 years for it to be of any use.
I suspect there may be trade off undergoing evolutionary selection here, where for some organisms a behaviour is more important from the offset, it's worth encoding more of the behaviour into genes, at what cost I wonder?
It's also possible there is some other mechanism going on at an embryonic stage, a kind of pre-training.
I suspect some of the division is also defined by how complex the task is, or how sensitive the model is to it's own neurons (kind of like PNN). I don't have a well rounded argument, but my instinct is that encoding or pre-training walking is far easier than seeing. Not to mention basic quadrupedal walking/standing is far easier than bipedal, they can learn the more complex coordinated movements after.