Lecture 18 – Permutation Testing, Bootstrapping

DSC 10, Fall 2022

Announcements

Agenda

Permutation testing

Purpose

Permutation tests help answer questions of the form:

I have two samples, but no information about any population distributions. Do these samples look like they were drawn from the same population?

Smoking and birth weight 👶

Setup for the hypothesis test

Strategy and implementation

Shuffling the labels

The 'Maternal Smoker' column defines the original groups. The 'Shuffed_Labels' column defines the random groups.

Calculating the test statistic

For the original groups:

For the random groups:

Repeating the process

Comparing the empirical distribution to the observed statistic

Conclusion

Concept Check ✅ – Answer at cc.dsc10.com

Recall, babies has two columns.

To randomly assign weights to groups, we shuffled 'Maternal Smoker' column. Could we have shuffled the 'Birth Weight' column instead?

Click here to see the answer to the previous question after you've submitted an answer to it. Yes, we could have. It doesn’t matter which column we shuffle – we could shuffle one or the other, or even both, as long as we shuffle each separately. Think about it like this – pretend you bring a gift 🎁 to a Christmas party 🎄 for a gift exchange, where everyone must leave the party with a random person’s gift. Pretend everyone stands around a circular table and puts the gift they bought in front of them. To randomly assign people to gifts, you could shuffle the gifts on the table and have all the people stay in the same spot, or you could have the people physically shuffle and keep the gifts in the same spots, or you could do both – either way, everyone will end up with a random gift!

Example: Did the New England Patriots cheat? 🏈

Background

The measurements

The question

Did the Patriots' footballs drop in pressure more than the Colts'?

The test statistic

Similar to the baby weights example, our test statistic will be the difference between the teams' average pressure drops. We'll calculate the mean drop for the 'Patriots' minus the mean drop for the 'Colts'.

The average pressure drop for the Patriots was about 0.74 psi more than the Colts.

Creating random groups and calculating one value of the test statistic

We'll run a permutation test to see if 0.74 psi is a significant difference.

The simulation

Conclusion

It doesn't look good for the Patriots. What is the p-value?

This p-value is low enough to consider this result to be highly statistically significant ($p<0.01$).

Caution! ⚠️

Aftermath

Quote from an investigative report commissioned by the NFL:

“[T]he average pressure drop of the Patriots game balls exceeded the average pressure drop of the Colts balls by 0.45 to 1.02 psi, depending on various possible assumptions regarding the gauges used, and assuming an initial pressure of 12.5 psi for the Patriots balls and 13.0 for the Colts balls.”

Aside: Establishing causation

To actually establish causation, we need the following two statements to be true:

  1. The data must come from a randomized controlled trial, to mitigate the effects of confounding factors.
  1. A permutation test must show a statistically significant difference in the outcome between the treatment and control group.

If both of these conditions are met, then we can conclude that the treatment causes the outcome.

Bootstrapping 🥾

City of San Diego employee salary data

All City of San Diego employee salary data is public. We are using the latest available data.

When you load in a dataset that has so many columns that you can't see them all, it's a good idea to look at the column names.

We only need the 'TotalWages' column, so let's get just that column.

Concept Check ✅ – Answer at cc.dsc10.com

Consider the question

What is the median salary of all San Diego city employees?

What is the right tool to answer this question?

The median salary

Let's be realistic...

In the language of statistics

The sample median

Let's survey 500 employees at random. To do so, we can use the .sample method.

We won't reassign my_sample at any point in this notebook, so it will always refer to this particular sample.

How confident are we that this is a good estimate?

The sample median is random

An impractical approach

The problem

Note that unlike the previous histogram we saw, this is depicting the distribution of the population and of one particular sample (my_sample), not the distribution of sample medians for 246 samples.

The bootstrap

Resampling with replacement

When bootstrapping, we resample with replacement. Why? 🤔

Resampling with replacement

Running the bootstrap

We can simulate the act of collecting new samples by sampling with replacement from our original sample, my_sample.

Bootstrap distribution of the sample median

What's the point of bootstrapping?

We have a sample median wage:

With it, we can say that the population median wage is approximately \$72,016, and not much else.

But by bootstrapping, we can generate an empirical distribution of the sample median:

which allows us to say things like

We think the population median wage is between \$67,000 and \\$77,000.

Next time, we'll talk about how to set this range precisely.

Summary, next time

Summary

Next time