Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes old, but even worse, it is not a well argued review. Yes, Bayesian statistics are slowly gaining an upper hand at higher levels of statistics, but you know what should be taught to first year undergrads in science? Exploratory data analysis! One of the first books I voluntarily read in stats was Mosteller and Tukey’s gem: Data Analysis and Regression. A gem. Another great book is Judea Pearl’s Book of Why.


On the subject of prioritizing EDA:

I need to look this up, but I recall in the 90s a social psychology journal briefly had a policy of "if you show us you're handling your data ethically, you can just show us a self-explanatory plot if you're conducting simple comparisons instead of NHST". That was after some early discussions about statistical reform in the 90s - Cohen's "The Earth is round (p < .05)" I think kick-started things off.


Definitely. It always amazes me that in many situations, I'm applying some stats algorithm just to conclude: let's look at these data some more...


Yes. And the same for DS/ML people also, please. The amount of ML people that can meaningfully drill down and actually understand the data is surprisingly low sometimes. Even worse for being able to understand a phenomena _using data_.


When you have a lot of fancy metrics/models/bootsraps to throw at, people would just see what sticks.


Happens all the time. Problems come quickly when the datasets used for evaluation are not clean, or the evaluation is incorrect - data leakage, problematic imbalance between groups, distribution shifts vs the actual production data. Or people just checking the average performance, but not typical or worst case. Have seen many people run in circles chasing metrics that are meaningless to the task they are supposed to be solving.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: