Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

When I try to get this point across about techniques like the PCA, I like to show that the measurement units strongly affect the inference.

Really, if your conclusions change depending on whether you measure in inches or centimeters, there’s something wrong with the analysis!



I would disagree and here is why:

> When I try to get this point across about techniques like the PCA, I like to show that the measurement units strongly affect the inference.

In such a case the problem is not with PCA but with application. PCA is just a rotation of the original coordinate system that projects the data on new axes which are aligned with the directions of highest variability. It is not the job of PCA to parse out the origin of that variability (is it because of different units, or different effects).

> Really, if your conclusions change depending on whether you measure in inches or centimeters, there’s something wrong with the analysis!

To get a statistical distance one should: subtract the mean if the measurements differ in origin; divide by standard deviation if the measurements differ in scale; rotate (or equivalently compute Mahalanobis distance) if the measurements are dependant (co-vary). The PCA itself is closely related to Mahalanobis distance: Euclidian distance on PCA-transformed data should be equivalent to Mahalanobis distance on the original data. So, saying that something is wrong with PCA because it doesn't take units of measurement into account is close to saying that something is wrong with dividing by standard deviation because it doesn't subtract the mean.


Is the effect of measurement units eliminated by applying something like zero mean unit variance normalization prior to dimensionality reduction?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: