Lecture 6 – Hypothesis Testing

DSC 80, Spring 2022



Throughout this lecture, we will look at how to "speed up" our hypothesis tests by avoiding for-loops entirely.

Hypothesis testing

Null hypothesis

Alternative hypothesis

Test statistics

For the alternative hypothesis "the coin was biased towards heads", we could use:

If these test statistics are large, it means there were many heads. If they are small, it means there were few heads.

For the alternative hypothesis "the coin was biased", we could use:

If this test statistic is large, it means that there were many more heads than expected, or many fewer heads than expected. If this test statistic is small, it means that the number of heads was close to expected.

Generating the null distribution

After choosing a test statistic, we need to compute the distribution of the test statistic, under the assumption that the null hypothesis is true ("under the null").

Example: coin flipping

Below, we create a DataFrame of coin flips.

There were 114 flips, 68 of which were heads.



  1. Compute the observed value of the test statistic, i.e. the observed number of heads. (We already know this to be 68.)
  2. Simulate values of the test statistic under the null, i.e. under the assumption that the coin was fair.
  3. Use the resulting distribution to calculate the (approximate) probability of seeing 68 or more heads in 114 coin flips, under the assumption the coin was fair.


Each entry in results is the number of heads in 114 simulated coin flips.

Plotting the empirical distribution of the test statistic

Question: Do you think the coin was fair?

P-values and cutoffs

⚠️ We can't "prove" the null!

Fun fact


Speeding things up 🏃

The following function takes in a value of N and flips a fair coin N * 114 times.

Timing the faster simulation

Total variation distance

Ethnic distribution of California vs. UCSD

Is the difference between the two distributions significant?

Let's establish our hypotheses.

Total variation distance

The total variation distance (TVD) is a test statistic that describes the distance between two categorical distributions.

If $A = [a_1, a_2, ..., a_k]$ and $B = [b_1, b_2, ..., b_k]$ are both categorical distributions, then the TVD between $A$ and $B$ is

$$\text{TVD}(A, B) = \frac{1}{2} \sum_{i = 1}^k |a_i - b_i|$$

Below, we can compute the TVD between California's ethnic distribution and UCSD's ethnic distribution.

The issue is we don't know whether this is a large value or a small value.

The plan

To conduct our hypothesis test:

Generating one random sample

To sample from a categorical distribution, we use np.random.multinomial.

Now we need to repeat the process of creating samples, many, many times.


Speeding things up 🏃

Again, we can get rid of the loop by using the size argument!

Our previous total_variation_distance function won't work with our 2D array eth_draws.

Summary of the method

To assess whether an "observed sample" was drawn randomly from a known categorical distribution:

Another example


Aside: loading data from seaborn

Average bill length by island

We will learn about the groupby method next week.

It appears that penguins on Torgersen Island have shorter bills on average than penguins on other islands. Could this have happened due to random chance?


The plan


It doesn't look like the average bill length of Torgersen Island penguins came from the null distribution of average bill lengths.

Speeding things up 🏃

Again (again), we can get rid of the loop by using the size argument!

We get the same result, but much quicker!

Summary, next time

The hypothesis testing "recipe"

Faced with a question about the data raised by an observation...

  1. Carefully pose the question as a testable "yes or no" hypothesis.
  2. Decide on a test statistic that helps differentiate between instances that would affirm or reject the hypothesis.
  3. Create a probability model for the data generating process that reflects the "known behavior" of the process.
  4. Simulate the data generating process using this probability model (the "null hypothesis").
  5. Assess if the observation is consistent with the simulations by computing a p-value.

We looked at three key examples:

Examples, next time