Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This article mentions the words accuracy/accurate/accurately 10 times, e.g.

"when using his own dating-site-sourced dataset, was accurate at predicting the sexuality of males with 68 per cent accuracy – better than a coin flip"

But accuracy seems like a poor measure for something like this, when the population is highly unbalanced. It's trivial to create a classifier with high accuracy: just outputting 'heterosexual' every time would yield ~90% accuracy on faces of the general population.



This is true, which is why machine learning has long since learned to not even think of what you describe as a meaningful measure of accuracy. If you look at the linked paper [0], you'll find that the author uses the "ROC AUC" metric [1]:

>The ROC AUC score represents the probability that when given one randomly chosen positive instance and one randomly chosen negative instance, the classifier will correctly identify the positive instance

[0] https://arxiv.org/pdf/1902.10739.pdf [1] https://en.wikipedia.org/wiki/Receiver_operating_characteris...


Thanks. That makes more sense.

The article didn't mention AUC, so I assumed they were talking about accuracy in the sense people normally mean it, which also matches the definition in the sidebar of the wikipedia link you shared:

(TP + TN) / (P + N)


IIRC in the original study they tested with a mix of 50%-50% gay/non-gay to avoid this problem. I guess this other study did the same sampling.


Much higher than 90%. The most recent poll showed the numbers hit their highest -- 4.5% of males.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: