Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

PCA (essentially SVD) the one that makes the fewest assumptions. It still works really well if your data is (locally) linear and more or less Gaussian. PLS is the regression version of PCA.

There are also nonlinear techniques. I’ve used UMAP and it’s excellent (particularly if your data approximately lies on a manifold).

https://umap-learn.readthedocs.io/en/latest/

The most general purpose deep learning dimensionality reduction technique is of course the autoencoder (easy to code in PyTorch). Unlike the above, it makes very few assumptions, but this also means you need a ton more data to train it.



> PCA (essentially SVD) the one that makes the fewest assumptions

Do you mean it makes the *strongest* assumptions? "your data is (locally) linear and more or less Gaussian" seems like a fairly strong assumption. Sorry for the newb question as I'm not very familiar with this space.


You’re correct in a mathematical sense: linearity and Gaussian are restrictive assumptions.

However I meant it colloquially in that those assumptions are trivially satisfied by many generating processes in the physical and engineering world, and there aren’t a whole lot of other requirements that need to be met.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: