Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Only if age, birthdate, and zip codes are uniformly distributed among the population.

Which they aren’t. At all.



Here's what I found for birthdates - https://www.panix.com/~murphy/bday.html. The variation between dates doesn't seem all that bad. So, for all practical purposes, we can assume that births across dates is uniform.

Given this, the only real issue is ZIP codes. If we assume that we know nothing about how populations are distributed across ZIP codes, given just the gender and date of birth, we can narrow down the cohort to just 5650 US persons (330M x 1/2 x 1/(80x365)).

According to this link - https://www.johndcook.com/blog/2019/08/21/zip-code-populatio... - 80% of the US population lives in 27% of her ZIP codes.

Assuming your target individual is in the 80%, given the gender, birthdate, and ZIP code, you can narrow down to the following - 0.8x330M x 1/2 x 1/(80x365) x 1/(0.27x42000) = 0.4 US persons per ZIP code.

Basically, these three data points can almost certainly uniquely identify specific individuals - the only remaining thing is to connect a name/phone number to each individual.


Obviously the data is not really anonymized.

Im just nitpicking your weirdly precise results of your fermi math. It would be easier to grab this from the census data, right? 40 year old males with a given birthdate, no zip code, narrows to ~7,154.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: