Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Don't be surprised if those predictions are heavily biased against minorities and poor people. Do you care if they do?

It's a similar problem to using ML to give people credit scores.

If the training data includes a lot of minorities and poor people breaking laws / delinquent payments, then your ML will simply key on race/economic status as a predictor.

So you've built a system that simply targets those groups.

But you might object and say that this race/economic status targeting gives the highest accuracy! It was only learned in the training data, after all. You can make a great classifier that is extremely unfair.

So you have to realize there is a conflict here between accuracy and fairness. This means there is a conflict between observational data (training), and using that data to produce decisions/outcomes.

If you make decisions/outcomes that reinforce the training data, you do not give racial groups/low economic status people a chance to improve their lives.

That is extremely inhuman, predatory, and unfair.



All I want to predict is time periods/locations which are vulnerable. Nothing more than that.


Racism is morally wrong but not mathematically wrong. P(criminal|black) > P(criminal), but if you observe that someone has black skin and treat them poorly because of it, you've done a bad thing. It doesn't matter that you were just following Bayesian reasoning because you're still hurting someone on the basis of something they can't control.

Lady Justice doesn't wear a blindfold as a fashion accessory. Discarding information is a key factor in nearly every established system of justice / morality. Refusing to do so (i.e. "just" running a ML algorithm) places you directly at odds with society's hard-earned best practices.


> Lady Justice doesn't wear a blindfold as a fashion accessory

I never noticed that before. Thanks for pointing this out!


> All I want to predict is time periods/locations which are vulnerable.

Ok, and to what end?

I assume someone else will be consuming these predictions, else you wouldn't bother at all.

What are your customers/users going to do with these predictions?

Or is that simply not your responsibility; someone else's problem?


Take a look at crimereports.com. You might get lucky and find a good source on a per city or county basis, it's too fragmented overall too try this. Different countries might have different documentation standards and publishing guidelines for this kinda of data, might be worth a shot to look.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: