One interesting thing to notice is how you can remodel xor into being a linear function by using u + v as input 1 and u * v as input 2 which means it can be represented in a NN without a hidden layer. And not only xor, it keeps all other logic gates simple. So only by transforming inputs one can reduce network complexity. Perhaps a field ripe for research.
indeed,
there is an extensive work done in kernel learning that is facinating
and one of the applications that still do these transformations are satellite imagery/multispectral imagery, you can get more information just from calculating the ndvi from the different bands of your image, which make it easy for your models to make decisions