Lecture 27 – Fairness, Conclusion

DSC 80, Winter 2023

Announcements

This lecture will not be delivered live! Instead, a pre-recorded version of this lecture can be found here.

Agenda

Fairness

Fairness: why do we care?

Example: COMPAS and recidivism prediction

COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) is a "black-box" model that estimates the likelihood that someone who has commited a crime will recidivate (commit another crime).


Propublica found that the model's false positive rate is higher for African-Americans than it is for White Americans, and that its false negative rate is lower for African-Americans than it is for White Americans.

Example: Facial recognition

Note:

$$PPV = \text{precision} = \frac{TP}{TP+FP},\:\:\:\:\:\: TPR = \text{recall} = \frac{TP}{TP + FN}, \:\:\:\:\:\: FPR = \frac{FP}{FP+TN}$$

How does bias occur?

Remember, our models learn patterns from the training data. Various sources of bias may be present within training data:

Example: Gender associations

soldier, teacher, nurse, doctor, dog, cat, president, nanny

Example: Gender associations

Example: Image searches

A 2015 study examined the image queries of vocations and the gender makeup in the search results. Since 2015, the behavior of Google Images has been improved.

In 2015, a Google Images search for "nurse" returned...

Search for "nurse" now, what do you see?

In 2015, a Google Images search for "doctor" returned...

Search for "doctor" now, what do you see?

Ethics: What gender ratio should we expect in the results?

Excerpts:

"male-dominated professions tend to have even more men in their results than would be expected if the proportions reflected real-world distributions.

"People’s existing perceptions of gender ratios in occupations are quite accurate, but that manipulated search results have an effect on perceptions."

How did this unequal representation occur?

Parity measures

Notation

Demographic parity

Accuracy parity

$$\mathbb{P}(C=Y|A=1) = \mathbb{P}(C=Y|A\neq 1)$$

True positive parity

$$\mathbb{P}(C=1|Y=1, A=1) = \mathbb{P}(C=1|Y=1, A\neq 1)$$

Other measures of parity

Example: Loan approval

LendingClub

LendingClub is a "peer-to-peer lending company"; they used to publish a dataset describing the loans that they approved (fortunately, we downloaded it while it was available).

The total amount of money loaned was over 5 billion dollars!

Predicting 'tag'

Let's build a classifier that predicts whether or not a loan was paid in full. If we were a bank, we could use our trained classifier to determine whether to approve someone for a loan!

Recall, a prediction of 1 means that we predict that the loan will be paid in full.

Precision

$$\text{precision} = \frac{TP}{TP+FP}$$

Precision describes the proportion of loans that were approved that would have been paid back.

If we subtract the precision from 1, we get the proportion of loans that were approved that would not have been paid back. This is known as the false discovery rate.

$$\frac{FP}{TP + FP} = 1 - \text{precision}$$

Recall

$$\text{recall} = \frac{TP}{TP + FN}$$

Recall describes the proportion of loans that would have been paid back that were actually approved.

If we subtract the recall from 1, we get the proportion of loans that would have been paid back that were denied. This is known as the false negative rate.

$$\frac{FN}{TP + FN} = 1 - \text{recall}$$

From both the perspective of the bank and the lendee, a high false negative rate is bad!

False negative rate by age

Computing parity measures

First, let's compute the proportion of loans that were approved in each group. If these two numbers are the same, $C$ achieves demographic parity.

$C$ evidently does not achieve demographic parity – older people are approved for loans far more often! Note that this doesn't factor in whether they were correctly approved or incorrectly approved.

Now, let's compute the accuracy of $C$ in each group. If these two numbers are the same, $C$ achieves accuracy parity.

Hmm... These numbers look much more similar than before!

Is this difference in accuracy significant?

Let's run a permutation test to see if the difference in accuracy is significant.

It seems like the difference in accuracy across the two groups is significant, despite being only ~6%. Thus, $C$ likely does not achieve accuracy parity.

Ethical questions of fairness

Not only should we use 'age' to determine whether or not to approve a loan, but we also shouldn't use other features that are strongly correlated with 'age', like 'emp_length'.

Parting thoughts

Course goals ✅

In this course, you...

Course outcomes ✅

Now, you...

Topics covered ✅

We learnt a lot this quarter.

Thank you!

Good luck on the Final Exam, and enjoy your spring break! 🎉