Don't be surprised if those predictions are heavily biased against minorities and poor people. Do you care if they do?
It's a similar problem to using ML to give people credit scores.
If the training data includes a lot of minorities and poor people breaking laws / delinquent payments, then your ML will simply key on race/economic status as a predictor.
So you've built a system that simply targets those groups.
But you might object and say that this race/economic status targeting gives the highest accuracy! It was only learned in the training data, after all. You can make a great classifier that is extremely unfair.
So you have to realize there is a conflict here between accuracy and fairness. This means there is a conflict between observational data (training), and using that data to produce decisions/outcomes.
If you make decisions/outcomes that reinforce the training data, you do not give racial groups/low economic status people a chance to improve their lives.
Racism is morally wrong but not mathematically wrong. P(criminal|black) > P(criminal), but if you observe that someone has black skin and treat them poorly because of it, you've done a bad thing. It doesn't matter that you were just following Bayesian reasoning because you're still hurting someone on the basis of something they can't control.
Lady Justice doesn't wear a blindfold as a fashion accessory. Discarding information is a key factor in nearly every established system of justice / morality. Refusing to do so (i.e. "just" running a ML algorithm) places you directly at odds with society's hard-earned best practices.
Take a look at crimereports.com. You might get lucky and find a good source on a per city or county basis, it's too fragmented overall too try this. Different countries might have different documentation standards and publishing guidelines for this kinda of data, might be worth a shot to look.
It's a similar problem to using ML to give people credit scores.
If the training data includes a lot of minorities and poor people breaking laws / delinquent payments, then your ML will simply key on race/economic status as a predictor.
So you've built a system that simply targets those groups.
But you might object and say that this race/economic status targeting gives the highest accuracy! It was only learned in the training data, after all. You can make a great classifier that is extremely unfair.
So you have to realize there is a conflict here between accuracy and fairness. This means there is a conflict between observational data (training), and using that data to produce decisions/outcomes.
If you make decisions/outcomes that reinforce the training data, you do not give racial groups/low economic status people a chance to improve their lives.
That is extremely inhuman, predatory, and unfair.