Saturday, May 6, 2017

More on Propublica’s Machine Bias Article

I wrote previously about a really biased Propublica article here. The gist of the Propublica piece was that a statistical model that predicted recidivism rates in criminals was biased against black people, because it had a higher proportion of false positives for blacks than for whites and a lower proportion of false negatives than for whites. Their framing was: this model is unfair because it disproportionately lets white repeat offenders off the hook (falsely identifying them as unlikely to repeat-offend) and it is disproportionately harsh on blacks (falsely identifying non-recidivating blacks as likely to repeat-offend). In both types of errors, the false-positives and false-negatives, it’s harsher on blacks than whites.

I said in my previous post: “I think the “false positive/false negative” result described in the above paragraph is just a statistical artifact of the fact that black defendants, for whatever reason, are more likely to recidivate (51.4% vs 39.4%, according to Propublica’s data).” I’ve confirmed my suspicions. The false positive/false negative disparity arises from the different underlying rates of recidivism for the two races. I am not making any general claims about crime rates by race; these statements are specific to the sample of criminals used in Propublica’s analysis. You could compare males to females, young to old, multiple priors to no priors. Any comparison of a high-recidivism to low-recidivism population will show this false positive/false negative disparity, even if the model is completely unbiased.

Assume you can divide the world into two identifiable classes: Blues and Greens. Suppose we live in a world with 1000 Greens and 1000 Blues. There are 600 high-risk Greens and 400 low-risk Greens. Blues are flipped: 400 high-risk and 600 low-risk Blues. A high-risk person has a 60% chance of recidivating and a low-risk person has a 30% chance of recidivating, regardless of class. Here is the breakdown of high- and low-risk, who subsequently offended or didn’t offend, broken out by class. Notice that the Greens have a higher false-positive rate and the Blues have a higher false-negative rate. The model is fair. It is accurately predicting recidivism rates for each grouping. The false-positive/false-negative differences are driven by the relative propensity of Greens and Blues to recidivate. The “unfairness” of the false-positive/false-negative proportions is driven by the underlying propensity to commit crimes. The model itself is actually fair. (The numbers and proportions chosen for this example match fairly closely to those in the Propublica study.)

Trivially, if we set the proportions of high- and low-risk individuals equal (500/500 for both races), the false positive/false negative disparity disappears. If we exacerbate the difference (say 900 high- and 100 low-risk Greens, flipped for Blues), we also exacerbate the false positive/false negative disparity. You end up with 83.7% false positives and 5.3% false negatives for the Greens and 6.0% false positives and 81.8% false negatives for the Blues. Amazingly, you’re treating everyone fairly. 60% of people labeled high-risk re-offend, Green or Blue. 30% of people labeled low-risk re-offend, Green or Blue. Your model is as accurate as it can be, and it’s not showing a racial bias in terms of recidivism rates. It’s just that there “really” are more high-risk Greens. 

I don't know why the original Propublica piece fixated on the false positive and false negative rates, other than that it gave them the answer they wanted. The false positive rate is the number of false positives divided by false positives plus true negatives. In other words, of those people who did not re-offend, the fraction that was wrongly labeled "high risk." The false negative rate is the number of false negatives over false negatives plus true positives. In other words, of those people who did re-offend, the fraction that was wrongly identified as low-risk. The false positive rate will be high for a high-risk group, even for an unbiased model. Ditto for the false negative rate for a low-risk group. These statistics simply don't tell you anything about whether the model is biased or not. 
At first blush this looks like a pointless statistical exercise. Propublica made a statistically naïve claim, and I’m pedantically debunking it. But I think a more general lesson can be pulled from this. Indulge me for a moment in a fairy tale. Suppose that in some community blacks really do commit more crimes than whites, but the drivers of the difference can be attributed to things like age, prior record, the criminal record of associates, school delinquency, etc. (If you had “race” as a variable in your regression model, it would show up as “statistically insignificant”, meaning not predictive of criminality, because other factors fully explain the entire difference.) But since races differ in their average age, average number of priors, and average number of associates with priors (and whatever else might be predictive of criminality), they have different average crime rates. The difference isn’t driven by race per se; it’s driven by the average demographics of the race. Now suppose that police officers realize that these demographic drivers are important, not necessarily through the use of a computer model, but they intuitively grasp the different crime rates. They use this intuitive knowledge to allocate their resources to younger vs older suspects, or suspects with more vs fewer priors, etc.. They would give equal treatment to two people from different races if they have otherwise identical demographics. People of both races might start to intuitively grasp the false positive/false negative disparity, which arises even if the police are perfectly fair and color-blind.  A black person in that society might fairly say, “Cops are always harassing us for no good reason. And they’re always letting guilty white people off the hook!” And his perception would be statistically accurate: the police in this world really would disproportionately let white criminals off the hook and harass innocent blacks more often, even if the cops aren’t responding to race at all. In this world it probably quickly becomes impossible not to notice race. The police really should be targeting suspects based on demographic factors that are predictive of crime, but this leads to an apparent racial disparity because the races have different average demographics. It might easily become a habit to let “race” become a lazy proxy for these other things. At this point, blacks catch on to the fact that, yes, cops really are unfairly targeting people because of their race. Civil disorder ensues.

You will see this racial disparity arise whenever there is 1) some kind of system for targeting individuals and 2) some resolution as to whether the targeting was correct or not. You will see this so long as there are average demographic differences between the races, even if race itself isn’t a factor (as described in the previous paragraph). Suppose prosecutors use some kind of criteria or decision making process for deciding who to prosecute (step 1) and the resolution is a guilty/not-guilty verdict (step 2). Well, you’re going to see more black people prosecuted and then found “not guilty”, and more guilty white people let off the hook (although you won’t ultimately know how many of these are guilty). Or suppose that cops decide who to stop-and-frisk based on demographic characteristics (step 1), and the resolution is an arrest for possession of contraband (step 2). Once again, you’re going to have a lot of unnecessary police stops for black people, and a lot of guilty white people will be let off the hook. Even if the police really are colorblind.

I’m not trying to argue that the apparent racial disparity in our justice system is all attributable to factors other than race. I’m sure that race itself is a factor in many decisions to stop, arrest, prosecute, convict, beat, or shoot a person. I’m just issuing a word of caution that these disparities will continue to exist even if we achieve a color-blind society. A process will wrongly be labeled as racist even when it isn’t, as the Propublica article demonstrates clearly.

As terrible as the original Propublica article was, I’m sort of glad they wrote it, because I never would have worked out this result otherwise. It’s a good thing to keep in mind. A higher overall rate of something means more false positives and fewer false negatives; a lower overall rate of something means the opposite. You will get this result even from a fair, unbiased statistical model.

No comments:

Post a Comment