Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I am glad they evaluated this hypothesis using weight decay which is primarily thought of to induce a structured representation. My first thought was that the entire paper was useless if they didn't do this experiment.

I find it rather interesting that the structured representations go from sparse to full to sparse as a function of layer depth. I have noticed that applying weight decay penalty as an exponential function of layer depth gives improved results over using a global weight decay.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: