Lecture 15 – Models and Viewpoints

DSC 10, Fall 2022

Announcements

Agenda

Statistical models

Models

Example

Galileo's Leaning Tower of Pisa Experiment

Example: Jury selection

Swain vs. Alabama, 1965

$\substack{\text{eligible} \\ \text{population}} \xrightarrow{\substack{\text{representative} \\ \text{sample}}} \substack{\text{jury} \\ \text{panel}} \xrightarrow{\substack{\text{selection by} \\ \text{judge/attorneys}}} \substack{\text{actual} \\ \text{jury}}$

Supreme Court ruling

"... the overall percentage disparity has been small...”

Our model for simulating Swain's jury panel

Our approach: simulation

Simulating statistics

Recall, a statistic is a number calculated from a sample.

  1. Run an experiment once to generate one value of a statistic.
    • In this case, sample 100 people randomly from a population that is 26% Black, and count the number of Black men (statistic).
  1. Run the experiment many times, generating many values of the statistic, and store these statistics in an array.
  1. Visualize the resulting empirical distribution of the statistic.

Step 1 – Running the experiment once

np.random.multinomial(sample_size, pop_distribution)

Aside: Example usage of np.random.multinomial

Halloween is on Monday, and you're getting ready to go trick-or-treating 👻. Suppose you'll visit 35 houses, and that each of the 35 houses you'll visit has the same candy box, containing:

At each house, you'll select one candy blindly from the candy box.

To simulate the act of going to 35 houses, we can use np.random.multinomial:

Step 1 – Running the experiment once

In our case, a randomly selected member of our population is Black with probability 0.26 and not Black with probability 1 - 0.26 = 0.74.

Each time we run the following cell, we'll get a new random sample of 100 people from this population.

Step 1 – Running the experiment once

We also need to calculate the statistic, which in this case is the number of Black men in the random sample of 100.

Step 2 – Repeat the experiment many times

Step 3 – Visualize the resulting distribution

Was a jury panel with 8 Black men suspiciously unusual?

Conclusion

Example: Genetics of peas 🟢

Gregor Mendel, 1822-1884

Screen%20Shot%202018-11-05%20at%2010.33.48%20PM.png

Mendel's model

Choosing a statistic

$$| \text{sample proportion of plants with purple flowers} - 0.75 |$$

Simulating Mendel's experiment

Without context, these numbers aren't helpful – we need to see where the value of the statistic in Mendel's original observation lies in this distribution!

Mendel's experiment

Was Mendel's model any good?

Mendelian inheritance

Viewpoints and test statistics

Choosing one of two viewpoints

Goal: choose between two views of the world, based on data in a sample.

Test statistics

Choosing one of two viewpoints

Is the observed value of the test statistic consistent with the empirical distribution of the test statistic (i.e., the simulated test statistics)?

Example: Is our coin fair?

Example: Is our coin fair?

Let's put these values in an array, since our simulations will also result in arrays.

Designing a test statistic for a pair of viewpoints

Let's consider the pair of viewpoints “This coin is fair.” OR “No, it’s not.”

Simulating a fair coin

Concept Check ✅ – Answer at cc.dsc10.com

Let's now consider the pair of viewpoints “This coin is fair.” OR “No, it's biased towards heads.” Which test statistic would be appropriate?

Another pair of viewpoints

Simulating a fair coin, again

All that will change from our previous simulation is the function we use to compute our test statistic.

Questions to consider before choosing a test statistic

Summary, next time

Summary

Next time