That would be super cool if it works! I’ve also wondered the same thing about ac...

porridgeraisin · 2025-02-27T16:12:21 1740672741

This idea exists (the broad field is called neural architecture search), although you have to parameterize it somehow to allow gradient descent to happen.

Here are examples:

https://arxiv.org/abs/2009.04759

https://arxiv.org/abs/1906.09529

FuckButtons · 2025-02-27T16:38:04 1740674284

Mostly because of computational efficiency irrc, the non linearity doesn’t seem to have much impact, so picking one that’s fast is a more efficient use of limited computational resources.