Lecture 10 – Permutation Testing

DSC 80, Spring 2022

Credits to Nicole Brye

Announcements

Agenda

Great reading: jwilber.me/permutationtest.

Overview

Review: Hypothesis testing

Examples so far

So far, our hypothesis tests have assessed a model given a single random sample.

Today's lecture

Often have two random samples we wish to compare.

Permutation testing

Example: Birth weight and smoking 🚬

Birth weight and smoking

Let's start by loading in data.

Only the 'Birth Weight' and 'Maternal Smoker' columns are relevant.

Exploratory data analysis

How many babies are in each group?

What is the average birth weight within each group?

Note that 16 ounces are in 1 pound, so the above weights are ~7-8 pounds.

Visualizing birth weight distributions

The setup

Alternative hypothesis: birth weights come from different distributions...

Null hypothesis: birth weights come from the same distribution

Choosing a test statistic

We need a test statistic that can measure how different two numerical distributions are.

Easiest solution: Difference in group means.

Difference in group means

To compute the difference between the mean birth weight of babies born to smokers and the mean birth weight of babies born to non-smokers, we can use groupby.

Note that we arbitrarily chose to compute the "smoking" mean minus the "non-smoking" mean. We could have chosen the other direction, too.

Another approach

Insteading of using .loc and manually subtracting, there is another method we can use to find the difference in group means – the diff Series/DataFrame method.

Testing through simulation

Implications of the null hypothesis

Permutation tests

Shuffling

Shuffling just one column

A single shuffle

Remember, it doesn't matter which column we shuffle! Here, we'll shuffle birth weights.

For details on how ** works, see this article.

How close are the means of the shuffled groups?

One benefit of shuffling 'Birth Weight' (instead of 'Maternal Smoker') is that grouping by 'Maternal Smoker' allows us to see all of the following information with a single call to groupby.

Simulation

We already computed the observed statistic earlier, but we compute it again below to keep all of our calculations together.

Conclusion of the test

⚠️ Caution!

Differences between categorical distributions

Example: Married vs. unmarried couples

We won't use all of the columns in the DataFrame.

Cleaning the dataset

The numbers in the DataFrame correspond to the mappings below.

Understanding the couples dataset

Ages are numeric, so the previous summary was not that helpful. Let's draw a histogram.

Let's look at the distribution of age separately for married couples and unmarried couples.

What's the difference in the two distributions? Why do you think there is a difference?

Understanding employment status in households

To answer these questions, let's compute the distribution of employment status conditional on household type (married vs. unmarried).

Since there are a different number of married and unmarried couples in the dataset, we can't compare the numbers above directly. We need to convert counts to proportions, separately for married and unmarried couples.

Both of the columns above sum to 1.

Differences in the distributions

Permutation test for household composition

Discussion Question

What is a good test statistic in this case?

Hint: What kind of distributions are we comparing?

Total variation distance

Let's first compute the observed TVD.

Since we'll need to calculate the TVD repeatedly, let's define a function that computes it.

Simulation

Again, let's first figure out how to perform a single shuffle. Here, we'll shuffle marital statuses.

Let's do this repeatedly.

Notice that by defining a function that computes our test statistic, our simulation code is much cleaner.

Results

Conclusion: household composition

Discussion Question

In the definition of the TVD, we divide the sum of the absolute differences in proportions between the two distributions by 2.

def tvd(a, b):
    return np.sum(np.abs(a - b)) / 2

Question: If we divided by 200 instead of 2, would we still reject the null hypothesis?

An alternative investigation

The Series isin method will be helpful here.

Let's group by mar_status once again.

Notice this is not a cateogrical distribution, so we don't need to use the TVD. Instead, we can just compute the difference in group means.

Simulation

Results

Conclusion: Household composition; not working, not by choice

Again, we reject the null hypothesis that married/unmarried households are similarly composed of those not working (not by choice) and otherwise.

Summary, next time

Summary