Here's a lay thought. The network must not have capacity to hold all the data. I...

andreyk · on May 13, 2017

People have demonstrated similar types of ideas are effective for optimizing network size - after training a highly redundant big model it's often possible to reduce it down to 1/10 of the parameters without significantly impacting performance by doing stuff like this (even simpler, I think pruning is often effective).

amelius · on May 13, 2017

http://www.jmlr.org/papers/volume15/srivastava14a.old/source...