from dsc80_utils import *
Announcements 📣¶
- Lab 9 due tomorrow.
- Last day for all redemptions is this Friday.
- Office hours on Thursday and Friday will probably be crowded! Start early.
- The Final Project is due on Wednesday, June 12th.
- No slip days allowed!
- The Final Exam is on Saturday, June 8th from 8AM-11AM in CENTER 216.
- Practice by working through old exams at practice.dsc80.com.
- You can bring two double-sided notes sheets (you can bring your midterm notes sheet, if you want).
- Check Ed for more details.
- If at least 80% of the class fills out both SETs and the End-of-Quarter Survey by Friday, June 7th at 11:59PM, then everyone will earn an extra 2% on the Final Exam.
- Thursday's class will start with career advice, then rest of time is exam review!
Agenda 📆¶
- Classifier evaluation.
- Logistic regression.
- Model fairness.
Aside: MLU Explain is a great resource with visual explanations of many of our recent topics (cross-validation, random forests, precision and recall, etc.).
Classifier evaluation¶
Precision and recall¶
$$\text{precision} = \frac{TP}{TP + FP} \: \: \: \: \: \: \: \: \text{recall} = \frac{TP}{TP + FN}$$🤔 Question: When might high precision be more important than high recall?
🙋 Answer: For instance, in deciding whether or not someone committed a crime. Here, false positives are really bad – they mean that an innocent person is charged!
🤔 Question: When might high recall be more important than high precision?
🙋 Answer: For instance, in medical tests. Here, false negatives are really bad – they mean that someone's disease goes undetected!
Question 🤔 (Answer at q.dsc80.com)
Taken from the Spring 2022 Final Exam.
After fitting a BillyClassifier
, we use it to make predictions on an unseen test set. Our results are summarized in the following confusion matrix.
Predicted Negative | Predicted Positive | |
---|---|---|
Actually Negative | ??? | 30 |
Actually Positive | 66 | 105 |
Part 1: What is the recall of our classifier? Give your answer as a fraction (it does not need to be simplified).
Part 2: The accuracy of our classifier is $\frac{69}{117}$. How many true negatives did our classifier have? Give your answer as an integer.
Part 3: True or False: In order for a binary classifier's precision and recall to be equal, the number of mistakes it makes must be an even number.
Part 4: Suppose we are building a classifier that listens to an audio source (say, from your phone’s microphone) and predicts whether or not it is Soulja Boy’s 2008 classic “Kiss Me thru the Phone." Our classifier is pretty good at detecting when the input stream is ”Kiss Me thru the Phone", but it often incorrectly predicts that similar sounding songs are also “Kiss Me thru the Phone."
Complete the sentence: Our classifier has...
- low precision and low recall.
- low precision and high recall.
- high precision and low recall.
- high precision and high recall.
Logistic regression¶
Wisconsin breast cancer dataset¶
The Wisconsin breast cancer dataset (WBCD) is a commonly-used dataset for demonstrating binary classification. It is built into sklearn.datasets
.
from sklearn.datasets import load_breast_cancer
loaded = load_breast_cancer() # explore the value of `loaded`!
data = loaded['data']
labels = 1 - loaded['target']
cols = loaded['feature_names']
bc = pd.DataFrame(data, columns=cols)
bc.head()
mean radius | mean texture | mean perimeter | mean area | ... | worst concavity | worst concave points | worst symmetry | worst fractal dimension | |
---|---|---|---|---|---|---|---|---|---|
0 | 17.99 | 10.38 | 122.80 | 1001.0 | ... | 0.71 | 0.27 | 0.46 | 0.12 |
1 | 20.57 | 17.77 | 132.90 | 1326.0 | ... | 0.24 | 0.19 | 0.28 | 0.09 |
2 | 19.69 | 21.25 | 130.00 | 1203.0 | ... | 0.45 | 0.24 | 0.36 | 0.09 |
3 | 11.42 | 20.38 | 77.58 | 386.1 | ... | 0.69 | 0.26 | 0.66 | 0.17 |
4 | 20.29 | 14.34 | 135.10 | 1297.0 | ... | 0.40 | 0.16 | 0.24 | 0.08 |
5 rows × 30 columns
1 stands for "malignant", i.e. cancerous, and 0 stands for "benign", i.e. safe.
labels
array([1, 1, 1, ..., 1, 1, 0])
pd.Series(labels).value_counts(normalize=True)
0 0.63 1 0.37 dtype: float64
Our goal is to use the features in bc
to predict labels
.
Logistic regression¶
Logistic regression is a linear classification technique that builds upon linear regression. It models the probability of belonging to class 1, given a feature vector:
$$P(y = 1 | \vec{x}) = \sigma (\underbrace{w_0 + w_1 x^{(1)} + w_2 x^{(2)} + ... + w_d x^{(d)}}_{\text{linear regression model}})$$Here, $\sigma(t) = \frac{1}{1 + e^{-t}}$ is the sigmoid function; its outputs are between 0 and 1 (which means they can be interpreted as probabilities).
🤔 Question: Suppose our logistic regression model predicts the probability that a tumor is malignant is 0.75. What class do we predict – malignant or benign? What if the predicted probability is 0.3?
🙋 Answer: We have to pick a threshold (e.g. 0.5)!
- If the predicted probability is above the threshold, we predict malignant (1).
- Otherwise, we predict benign (0).
- In practice, we use cross validation to decide this threshold.
Fitting a logistic regression model¶
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
X_train, X_test, y_train, y_test = train_test_split(bc, labels)
clf = LogisticRegression(max_iter=10000)
clf.fit(X_train, y_train)
LogisticRegression(max_iter=10000)
How did clf
come up with 1s and 0s?
clf.predict(X_test)
array([0, 0, 0, ..., 1, 0, 0])
It turns out that the predicted labels come from applying a threshold of 0.5 to the predicted probabilities. We can access the predicted probabilities using the predict_proba
method:
# [:, 1] refers to the predicted probabilities for class 1.
clf.predict_proba(X_test)
array([[1. , 0. ], [1. , 0. ], [0.91, 0.09], ..., [0. , 1. ], [0.92, 0.08], [1. , 0. ]])
Note that our model still has $w^*$s:
clf.intercept_
array([-37.04])
clf.coef_
array([[-0.61, -0.33, 0.2 , ..., 0.49, 0.68, 0.08]])
Evaluating our model¶
Let's see how well our model does on the test set.
from sklearn import metrics
y_pred = clf.predict(X_test)
Which metric is more important for this task – precision or recall?
metrics.confusion_matrix(y_test, y_pred)
array([[93, 1], [ 7, 42]])
from sklearn.metrics import ConfusionMatrixDisplay
ConfusionMatrixDisplay.from_estimator(clf, X_test, y_test);
plt.grid(False)
metrics.accuracy_score(y_test, y_pred)
0.9440559440559441
metrics.precision_score(y_test, y_pred)
0.9767441860465116
metrics.recall_score(y_test, y_pred)
0.8571428571428571
What if we choose a different threshold?¶
🤔 Question: Suppose we choose a threshold higher than 0.5. What will happen to our model's precision and recall?
🙋 Answer: Precision will increase, while recall will decrease*.
- If the "bar" is higher to predict 1, then we will have fewer positives in general, and thus fewer false positives.
- The denominator in $\text{precision} = \frac{TP}{TP + FP}$ will get smaller, and so precision will increase.
- However, the number of false negatives will increase, as we are being more "strict" about what we classify as positive, and so $\text{recall} = \frac{TP}{TP + FN}$ will decrease.
- *It is possible for either or both to stay the same, if changing the threshold slightly (e.g. from 0.5 to 0.500001) doesn't change any predictions.
Similarly, if we decrease our threshold, our model's precision will decrease, while its recall will increase.
Trying several thresholds¶
The classification threshold is not actually a hyperparameter of LogisticRegression
, because the threshold doesn't change the coefficients ($w^*$s) of the logistic regression model itself (see this article for more details).
- Still, the threshold affects our decision rule, so we can tune it using cross-validation (which is not what we're doing below).
- It's also useful to plot how our metrics change as we change the threshold.
thresholds = np.arange(0.01, 1.01, 0.01)
precisions = np.array([])
recalls = np.array([])
for t in thresholds:
y_pred = clf.predict_proba(X_test)[:, 1] >= t
precisions = np.append(precisions, metrics.precision_score(y_test, y_pred, zero_division=1))
recalls = np.append(recalls, metrics.recall_score(y_test, y_pred))
Let's visualize the results.
px.line(x=thresholds, y=precisions,
labels={'x': 'Threshold', 'y': 'Precision'}, title='Precision vs. Threshold', width=1000, height=600)