from dsc80_utils import *
Announcements 📣¶
Guest lecture on Thursday Dec 5, 1:30pm-3pm in the HDSI MPR: Dr. Mohammad Ramezanali, an AI lead from Salesforce, will be talking about LLMs and how he uses them in industry.
- No regular lecture on Dec 5.
- If you attend the guest lecture, you will get lecture attendance credit and 1% extra credit on your final exam grade.
- If you can't make it, we'll record the talk and you can get attendance + extra credit by making a post on Ed with a few paragraphs about the talk (details to come).
The Final Project is due on Tue Dec 10.
- No slip days allowed!
The Final Exam is on Saturday, Dec 7 from 11:30am-2:30pm in PODEM 1A18 and 1A19.
If at least 80% of the class fills out both SETs and the End-of-Quarter Survey by Friday, Dec 6 at 11:59PM, then everyone will earn an extra 1% on the Final Exam.
Model fairness¶
Fairness: why do we care?¶
- Sometimes, a model performs better for certain groups than others; in such cases we say the model is unfair.
- Since ML models are now used in processes that significantly affect human lives, it is important that they are fair!
- Job applications and college admissions.
- Criminal sentencing and parole grants.
- Predictive policing.
- Credit and loans.
Model fairness¶
- We'd like to build a model that is fair, meaning that it performs the same for individuals within a group and individuals outside of the group.
- What do we mean by "perform"? What do we mean by "the same"?
Parity measures for classifiers¶
Suppose $C$ is a classifier we've already trained, and $A$ is some binary attribute that denotes whether an individual is a member of a sensitive group – that is, a group we want to avoid discrimination for (e.g. $A = \text{age is less than 25}$).
- $C$ achieves accuracy parity if $C$ has the same accuracy for individuals in $A$ and individuals not in $A$.
- Example: $C$ is a binary classifier that determines whether someone receives a loan.
- If the classifier predicts correctly, then either $C$ approves the loan and it is paid off, or $C$ denies the loan and it would have defaulted.
- If $C$ achieves accuracy parity, then the proportion of correctly classified loans should be the same for those under 25 and those over 25.
- Example: $C$ is a binary classifier that determines whether someone receives a loan.
- $C$ achieves precision (or recall) parity if $C$ has the same precision (or recall) for individuals in $A$ and individuals not in $A$.
- Recall parity is often called "true positive rate parity."
- $C$ achieves demographic parity if the proportion of predictions that are positive is equal for individuals in $A$ and individuals not in $A$.
- With the exception of demographic parity, the parity measures above all involve checking whether some evaluation metric from Lecture 17 is equal across two groups.
More on parity measures¶
- Which parity metric should you care about? It depends on your specific dataset and what types of errors are important!
- Many of these parity measures are impossible to satisfy simultaneously!
- The classifier parity metrics mentioned on the previous slide are only a few of the many possible parity metrics. See these DSC 167 notes for more details, including more formal explanations.
- These don't apply for regression models; for those, we may care about RMSE parity or $R^2$ parity. There is also a notion of demographic parity for regression models, but it is outside of the scope of DSC 80.
Example: Loan approval¶
As you know from Project 2, LendingClub was a "peer-to-peer lending company"; they used to publish a dataset describing the loans that they approved.
'tag'
: whether loan was repaid in full (1.0) or defaulted (0.0).'loan_amnt'
: amount of the loan in dollars.'emp_length'
: number of years employed.'home_ownership'
: whether borrower owns (1.0) or rents (0.0).'inq_last_6mths'
: number of credit inquiries in last six months.'revol_bal'
: revolving balance on borrows accounts.'age'
: age in years of the borrower (protected attribute).
loans = pd.read_csv(Path('data') / 'loan_vars1.csv', index_col=0)
loans.head()
loan_amnt | emp_length | home_ownership | inq_last_6mths | revol_bal | age | tag | |
---|---|---|---|---|---|---|---|
268309 | 6400.0 | 0.0 | 1.0 | 1.0 | 899.0 | 22.0 | 0.0 |
301093 | 10700.0 | 10.0 | 1.0 | 0.0 | 29411.0 | 19.0 | 0.0 |
1379211 | 15000.0 | 10.0 | 1.0 | 2.0 | 9911.0 | 48.0 | 0.0 |
486795 | 15000.0 | 10.0 | 1.0 | 2.0 | 15883.0 | 35.0 | 0.0 |
1481134 | 22775.0 | 3.0 | 1.0 | 0.0 | 17008.0 | 39.0 | 0.0 |
The total amount of money loaned was over 5 billion dollars!
loans['loan_amnt'].sum()
np.float64(5706507225.0)
loans.shape[0]
386772
Predicting 'tag'
¶
Let's build a classifier that predicts whether or not a loan was paid in full. If we were a bank, we could use our trained classifier to determine whether to approve someone for a loan!
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
X = loans.drop('tag', axis=1)
y = loans.tag
X_train, X_test, y_train, y_test = train_test_split(X, y)
clf = RandomForestClassifier(n_estimators=50)
clf.fit(X_train, y_train)
RandomForestClassifier(n_estimators=50)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RandomForestClassifier(n_estimators=50)
Recall, a prediction of 1 means that we predict that the loan will be paid in full.
y_pred = clf.predict(X_test)
y_pred
array([0., 1., 1., ..., 1., 1., 0.])
clf.score(X_test, y_test)
0.7129471626694797
from sklearn.metrics import ConfusionMatrixDisplay
ConfusionMatrixDisplay.from_estimator(clf, X_test, y_test);
plt.grid(False)
Precision¶
$$\text{precision} = \frac{TP}{TP+FP}$$
Precision describes the proportion of loans that were approved that would have been paid back.
from sklearn import metrics
metrics.precision_score(y_test, y_pred)
np.float64(0.7715641199421172)
If we subtract the precision from 1, we get the proportion of loans that were approved that would not have been paid back. This is known as the false discovery rate.
$$\frac{FP}{TP + FP} = 1 - \text{precision}$$
1 - metrics.precision_score(y_test, y_pred)
np.float64(0.22843588005788285)
Recall¶
$$\text{recall} = \frac{TP}{TP + FN}$$
Recall describes the proportion of loans that would have been paid back that were actually approved.
metrics.recall_score(y_test, y_pred)
np.float64(0.7337432717264445)
If we subtract the recall from 1, we get the proportion of loans that would have been paid back that were denied. This is known as the false negative rate.
$$\frac{FN}{TP + FN} = 1 - \text{recall}$$
1 - metrics.recall_score(y_test, y_pred)
np.float64(0.2662567282735555)
From both the perspective of the bank and the lendee, a high false negative rate is bad!
- The bank left money on the table – the lendee would have paid back the loan, but they weren't approved for a loan.
- The lendee deserved the loan, but weren't given one.
False negative rate by age¶
results = X_test
results['age_bracket'] = results['age'].apply(lambda x: 5 * (x // 5 + 1))
results['prediction'] = y_pred
results['tag'] = y_test
(
results
.groupby('age_bracket')
[['tag', 'prediction']]
.apply(lambda x: 1 - metrics.recall_score(x['tag'], x['prediction']))
.plot(kind='bar', title='False Negative Rate by Age Group')
)