Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Programmers Need to Learn Statistics or I Will Kill Them All (2015) (zedshaw.com)
27 points by gedrap on July 7, 2016 | hide | past | favorite | 9 comments


As a programmer I'll admit to you right now - I don't know shit about statistics. I took the course during my degree - it was like a shaman was cursing me. Never gonna catch me talking shit about stats. This post just validated everything I felt about it.


(I think?) I agree, but this post didn't make me want to learn more about stats.

This might be because for me it feels like 20% knowledge will work for 80% of situations. Maybe it's a fundamental disconnect; am I going to mess around R when Python works and is more general purpose (but to be fair, there's loads of brilliant Python stats code out there).

At least the standard deviation tip makes sense to me. Yay?

Edit: Man, tried to read up on some stuff and stats is hard. I'm definitely going to go easy on the performance guys in future. Any more tips for getting better at this kind of analysis?


This could use a (2015) tag.

The article is a bit rambling, but it achieves its objective of convincing me that (1) the hand-waving that sometimes passes for statistical analysis is not good enough and (2) it's realistic to do better and expect better.


> This could use a (2015) tag.

Whoops. Edited.

> The article is a bit rambling

Sort of, yeah. The tone is certainly distracting some readers from the message.

But I believe the message is very important, especially as data-* is becoming more and more popular. More and more tools measuring performance or platforms claiming amazing performance, spread of analytical tools, data science going mainstream, etc.

Often enough, fundamentally flawed analysis brings more harm than benefit. It does mean that you shouldn't try anything unless you are an expert in stats. The issue often is with overconfidence and ignorance.


This article was written many years before 2015. I would guess 2007.

But I enjoy it every time.


>Oh, and you wonder why I say, “he”? I never have this problem with female programmers.

Programmers need to stop virtual signaling with gender in absolutely everything or I will kill them all.


I think Zed Shaw should get off of his high horse. He knows a little about statistics, and think he knows plenty.

A distribution that cannot go negative (a real world system), cannot be a normal distribution because normal distributions must extend infinitely in both directions. If your standard deviation is small enough though, a normal distribution can be a fitting model.

> Almost all of the queries performed great, except one query that had sub-second response on average, but a 60 second standard deviation!

If the average response is < 1 sec than a 60 second standard deviation means that many responses are being made before the query is sent.

Moving from just using averages and thinking about properly measuring is a good start, but assuming that all distributions are normal is nearly as bad.


> If the average response is < 1 sec than a 60 second standard deviation means that many responses are being made before the query is sent.

Imagine if 99.99% of queries take 1 millisecond, and the remaining 0.01% of queries take 6000 seconds. Then the mean is 0.6010 seconds and the standard deviation is 59.997 seconds.

(Muphry's law strikes again...)


>Oh, and you wonder why I say, “he”? I never have this problem with female programmers.

gedrap, please don't post sexist material on HN.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: