Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That would be super cool if it works! I’ve also wondered the same thing about activation functions. Why not let the algorithm learn the activation function?


This idea exists (the broad field is called neural architecture search), although you have to parameterize it somehow to allow gradient descent to happen.

Here are examples:

https://arxiv.org/abs/2009.04759

https://arxiv.org/abs/1906.09529


Mostly because of computational efficiency irrc, the non linearity doesn’t seem to have much impact, so picking one that’s fast is a more efficient use of limited computational resources.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: