Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Well, it's more like fruits and vegetables. The author proposed a normalized inner product as replacement for the standard inner product.

It's not an activation function, because it has the learnable weights of a linear projection (mat vec multiplication) and the clamping properties of an activation function all in one.

My personal issue with the proposal is that it essentially doubles the amount of memory needed on-chip.

Yat-Product GEMMV now needs to store the running total of the inner product and the norm of the input vectors. That's a big cost increase for something that might not improve performance all that much.



that's a great point you made, but the goal of this research paper is not to improve the performance, but to show that you can train deep neural networks without the need of activation functions, normalization layers, deep neural networks.

one simple usecase for them is physics-informed neural networks and neural ODEs, where using activation functions is discouraged, mainly because they aren't infinitly differentiable, and they use the tanh or the sin most of the time, this kernel i introduced works better then the neurons followed with a tanh to solve different PDEs




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: