Lecture 10 – Permutation Testing

DSC 80, Spring 2023

Agenda

Permutation testing

Hypothesis testing vs. permutation testing

"Standard" hypothesis testing helps us answer questions of the form:

I have a population distribution, and I have one sample. Does this sample look like it was drawn from the population?

It does not help us answer questions of the form:

I have two samples, but no information about any population distributions. Do these samples look like they were drawn from the same population?

That's where permutation testing comes in.

Example: Birth weight and smoking 🚬

Note: For familiarity, we'll start with an example from DSC 10. This means we'll move quickly!

Birth weight and smoking 🚬

Let's start by loading in the data.

We're only interested in the 'Birth Weight' and 'Maternal Smoker' columns.

Note that there are two samples:

Exploratory data analysis

How many babies are in each group? What is the average birth weight within each group?

Note that 16 ounces are in 1 pound, so the above weights are ~7-8 pounds.

Visualizing birth weight distributions

Below, we draw the distributions of both sets of birth weights.

There appears to be a difference, but can it be attributed to random chance?

Hypothesis test setup

Null hypothesis: birth weights come from the same distribution

Alternative hypothesis: birth weights come from different distributions

Choosing a test statistic

We need a test statistic that can measure how different two numerical distributions are.

Easiest solution: Difference in group means.

Difference in group means

We'll choose our test statistic to be:

$$\text{mean weight of smokers' babies} - \text{mean weight of non-smokers' babies}$$

We could also compute the non-smokers' mean minus the smokers' mean, too.

At first, you may think to use loc with group_means to compute the difference in group means.

However, you can also use the diff method.

Hypothesis test setup

$$\text{mean weight of smokers' babies} - \text{mean weight of non-smokers' babies}$$

Implications of the null hypothesis

Permutation tests

Shuffling

How close are the means of the shuffled groups?

One benefit of shuffling 'Birth Weight' (instead of 'Maternal Smoker') is that grouping by 'Maternal Smoker' allows us to see all of the following information with a single call to groupby.

Let's visualize both pairs of distributions – what do you notice?

Simulating the empirical distribution of the test statistic

We already computed the observed statistic earlier, but we compute it again below to keep all of our calculations together.

Conclusion of the test

⚠️ Caution!

Differences between categorical distributions

Example: Married vs. unmarried couples

We won't use all of the columns in the DataFrame.

Cleaning the dataset

The numbers in the DataFrame correspond to the mappings below.

Understanding the couples dataset

Let's look at the distribution of age separately for married couples and unmarried couples.

How are these two distributions different? Why do you think there is a difference?

Understanding employment status in households

To answer these questions, let's compute the distribution of employment status conditional on household type (married vs. unmarried).

Since there are a different number of married and unmarried couples in the dataset, we can't compare the numbers above directly. We need to convert counts to proportions, separately for married and unmarried couples.

Both of the columns above sum to 1.

Differences in the distributions

Are the distributions of employment status for married people and for unmarried people who live with their partners different?

Is this difference just due to noise?

Permutation test for household composition

Discussion Question

What is a good test statistic in this case?

Hint: What kind of distributions are we comparing?

Total variation distance

Let's first compute the observed TVD, using our new knowledge of the diff method.

Since we'll need to calculate the TVD repeatedly, let's define a function that computes it.

Simulation

Here, we'll shuffle marital statuses, though remember, we could shuffle employment statuses too.

Let's do this repeatedly.

Notice that by defining a function that computes our test statistic, our simulation code is much cleaner.

Conclusion of the test

We reject the null hypothesis that married/unmarried households have similar employment makeups.

We can't say anything about why the employment makeups are different, though!

Discussion Question

In the definition of the TVD, we divide the sum of the absolute differences in proportions between the two distributions by 2.

def tvd(a, b):
    return np.sum(np.abs(a - b)) / 2

Question: If we divided by 200 instead of 2, would we still reject the null hypothesis?

Summary

Summary

Next time

Wrap up hypothesis and permutation testing. Revisit missing values.