Lecture 16 – Hypothesis Testing

DSC 10, Fall 2022

Announcements

Agenda

Decisions and uncertainty

Incomplete information

Testing hypotheses

Null and alternative hypotheses

Concept Check ✅ – Answer at cc.dsc10.com

Consider the pair of hypotheses "this coin is fair" and "this coin is unfair."

Which is the null hypothesis?

Test statistics, revisited

Considerations when choosing a test statistic

Concept Check ✅ – Answer at cc.dsc10.com

Consider the pair of hypotheses "this coin is fair" and "this coin is unfair." Which test statistic(s) could we use to test these hypotheses?

Empirical distribution of the test statistic

Question for today: Is there a formal definition for what we mean by "consistent"?

Example: Is my coin fair?

Small values of the observed statistic should make you side with the null hypothesis, that the coin is fair. But how small?

Example: Midterm exam scores

The problem

😱

Thought experiment 💭🧪

Suraj's defense

What are the observed characteristics of Section C?

Simulating under the null hypothesis

What's the verdict? 🤔

Statistical significance

Question: What is the probability that under the null hypothesis, a result at least as extreme as our observation occurs?

Definition of the p-value

Conventions about inconsistency

What does the p-value mean?

The cutoff for the p-value is an error probability. If:

then there is about a 0.05 chance that your test will (incorrectly) reject the null hypothesis.

In other words, if Suraj teaches 20 sections of DSC 10, he would expect to see students with a "statistically significantly low" average in one of those sections.

Comparing distributions

Jury selection in Alameda County


Jury panels

Recall from Lecture 15:

$\substack{\text{eligible} \\ \text{population}} \xrightarrow{\substack{\text{representative} \\ \text{sample}}} \substack{\text{jury} \\ \text{panel}} \xrightarrow{\substack{\text{selection by} \\ \text{judge/attorneys}}} \substack{\text{actual} \\ \text{jury}}$

Section 197 of California's Code of Civil Procedure says,

"All persons selected for jury service shall be selected at random, from a source or sources inclusive of a representative cross section of the population of the area served by the court."

ACLU study

What do you notice? 👀

Are the differences in representation meaningful?

The distance between two distributions

The distance between two distributions

Statistic: Total Variation Distance

The Total Variation Distance (TVD) of two categorical distributions is the sum of the absolute differences of their proportions, all divided by 2.

Concept Check ✅ – Answer at cc.dsc10.com

What is the TVD between the distributions of class standing in DSC 10 and DSC 40A?

Class Standing DSC 10 DSC 40A
Freshman 0.45 0.15
Sophomore 0.35 0.35
Junior 0.15 0.35
Senior+ 0.05 0.15

Statistic: Total Variation Distance

Simulate drawing jury panels

Note: np.random.multinomial creates samples drawn with replacement, even though real jury panels would be drawn without replacement. However, when the sample size (1453) is small relative to the population (number of people in Alameda County), the resulting distributions will be roughly the same whether we sample with or without replacement.

The simulation

Repeating the experiment

Calculating the p-value

Are the jury panels representative?

Summary, next time

The hypothesis testing "recipe"

  1. State hypotheses: State the null and alternative hypotheses. We must be able to simulate data under the null hypothesis.
  2. Choose test statistic: Choose something that allows you to distinguish between the two hypotheses based on whether its value is high or low.
  3. Simulate: Draw samples under the null hypothesis, and calculate the test statistic on each one.
  4. Visualize: Plot the simulated values of the test statistic in a histogram, and compare this to the observed statistic (black line).
  5. Calculate p-value: Find the proportion of simulations for which the test statistic was at least as extreme as the one observed.

Why does it matter?

Next time