# Set up packages for lecture. Don't worry about understanding this code,
# but make sure to run it if you're following along.
import numpy as np
import babypandas as bpd
from matplotlib_inline.backend_inline import set_matplotlib_formats
import matplotlib.pyplot as plt
set_matplotlib_formats("svg")
plt.style.use('ggplot')
Simulations.
np.random.choice(options)
.options
, is a list or array to choose from.options
. By default, all elements are equally likely to be chosen.# Simulate a fair coin flip.
np.random.choice(['Heads', 'Tails'])
'Heads'
# Simulate a roll of a die.
np.random.choice(np.arange(1, 7))
3
np.random.choice(options, n)
will return an array of n
randomly selected elements from options
.
# Simulate 10 fair coin flips.
np.random.choice(['Heads', 'Tails'], 10)
array(['Heads', 'Tails', 'Heads', 'Heads', 'Heads', 'Heads', 'Heads', 'Tails', 'Tails', 'Tails'], dtype='<U5')
np.random.choice
selects with replacement.replace=False
.# Choose three colleges to win free HDH swag.
colleges = ['Revelle', 'John Muir', 'Thurgood Marshall',
'Earl Warren', 'Eleanor Roosevelt', 'Sixth', 'Seventh', 'Eighth']
np.random.choice(colleges, 3, replace=False)
array(['John Muir', 'Thurgood Marshall', 'Seventh'], dtype='<U17')
What's the probability of getting 60 or more heads if we flip 100 coins?
Plan:
np.random.choice
to flip 100 coins.np.count_nonzero
to count the number of heads.np.count_nonzero(array)
returns the number of entries in array
that are True
.coins = np.random.choice(['Heads', 'Tails'], 100)
coins
array(['Tails', 'Tails', 'Tails', 'Heads', 'Heads', 'Heads', 'Heads', 'Heads', 'Heads', 'Tails', 'Tails', 'Heads', 'Tails', 'Heads', 'Tails', 'Tails', 'Tails', 'Tails', 'Heads', 'Tails', 'Tails', 'Heads', 'Tails', 'Heads', 'Tails', 'Heads', 'Tails', 'Heads', 'Heads', 'Tails', 'Heads', 'Heads', 'Tails', 'Heads', 'Heads', 'Heads', 'Tails', 'Heads', 'Heads', 'Heads', 'Heads', 'Tails', 'Heads', 'Tails', 'Heads', 'Tails', 'Tails', 'Heads', 'Heads', 'Heads', 'Tails', 'Tails', 'Heads', 'Tails', 'Tails', 'Heads', 'Tails', 'Tails', 'Tails', 'Heads', 'Tails', 'Tails', 'Heads', 'Heads', 'Tails', 'Tails', 'Tails', 'Tails', 'Heads', 'Tails', 'Tails', 'Tails', 'Tails', 'Heads', 'Heads', 'Heads', 'Tails', 'Tails', 'Tails', 'Heads', 'Tails', 'Tails', 'Heads', 'Heads', 'Heads', 'Heads', 'Tails', 'Heads', 'Heads', 'Heads', 'Tails', 'Tails', 'Heads', 'Tails', 'Tails', 'Heads', 'Tails', 'Tails', 'Heads', 'Tails'], dtype='<U5')
coins == 'Heads'
array([False, False, False, True, True, True, True, True, True, False, False, True, False, True, False, False, False, False, True, False, False, True, False, True, False, True, False, True, True, False, True, True, False, True, True, True, False, True, True, True, True, False, True, False, True, False, False, True, True, True, False, False, True, False, False, True, False, False, False, True, False, False, True, True, False, False, False, False, True, False, False, False, False, True, True, True, False, False, False, True, False, False, True, True, True, True, False, True, True, True, False, False, True, False, False, True, False, False, True, False])
(coins == 'Heads').sum()
48
np.count_nonzero(coins == 'Heads') # Counts the number of Trues in the sequence.
48
np.count_nonzero([5, 6, 0, 2])
3
count_nonzero
?True == 1
and False == 0
, so counting the non-zero elements counts the number of True
s.This makes it easy to run the experiment repeatedly.
def coin_experiment():
coins = np.random.choice(['Heads', 'Tails'], 100)
return np.count_nonzero(coins == 'Heads')
coin_experiment()
54
for
-loop!np.append
!head_counts = np.array([])
head_counts
array([], dtype=float64)
head_counts = np.append(head_counts, 15)
head_counts
array([15.])
head_counts = np.append(head_counts, 25)
head_counts
array([15., 25.])
# Specify the number of repetitions.
repetitions = 10000
# Create an empty array to store the results.
head_counts = np.array([])
for i in np.arange(repetitions):
# For each repetition, run the experiment and add the result to head_counts.
head_count = coin_experiment()
head_counts = np.append(head_counts, head_count)
len(head_counts)
10000
head_counts
array([54., 53., 48., ..., 58., 59., 48.])
# In how many experiments was the number of heads >= 60?
at_least_60 = np.count_nonzero(head_counts >= 60)
at_least_60
319
# What is this as a proportion?
at_least_60 / repetitions
0.0319
# Can also use np.mean()! Why?
np.mean(head_counts >= 60)
0.0319
This is quite close to the true theoretical answer!
# The theoretical answer – don't worry about how or why this code works.
import math
sum([math.comb(100, i) * (1 / 2) ** 100 for i in np.arange(60, 101)])
0.028443966820490392
head_counts
array([54., 53., 48., ..., 58., 59., 48.])
bpd.DataFrame().assign(
Number_of_Heads=head_counts
).plot(kind='hist', bins=np.arange(30, 70), density=True, ec='w', figsize=(10, 5));
plt.axvline(60, color='C1', linewidth=4);
Suppose you’re on a game show, and you’re given the choice of three doors. A car 🚗 is behind one of the doors, and goats 🐐🐐 are behind the other two.
You pick a door, say Door #2, and the host, who knows what’s behind the doors, opens another door, say Door #3, which has a goat.
The host then says to you, “Do you want to switch to Door #1 or stay with Door #2?”
Question: Should you stay or switch?
(The question was posed in Parade magazine’s "Ask Marilyn" column in 1990. It is called the "Monty Hall problem" because Monty Hall hosted a similar game show called "Let's Make a Deal.")
from IPython.display import IFrame
IFrame('https://montyhall.io/', width=600, height=400)
Suppose you originally selected Door #2. The host reveals Door #3 to have a goat behind it. What should you do?
A. Stay with Door #2; it has just as high a chance of winning as Door #1. It doesn't matter whether you switch or not.
B. Switch to Door #1; it has a higher chance of winning than Door #2.
Plan:
When you pick a door, there are three equally-likely outcomes for what is behind the door you picked:
options = np.array(['Car', 'Goat #1', 'Goat #2'])
behind_picked_door = np.random.choice(options)
behind_picked_door
'Car'
When the host opens a different door, they always reveal a goat.
if behind_picked_door == 'Goat #1':
revealed = 'Goat #2'
elif behind_picked_door == 'Goat #2':
revealed = 'Goat #1'
else:
# This is the case in which you originally picked a car!
revealed = np.random.choice(['Goat #1', 'Goat #2'])
revealed
'Goat #2'
If you always switch, you'll end up winning the prize that is neither behind_picked_door
nor revealed
.
options
array(['Car', 'Goat #1', 'Goat #2'], dtype='<U7')
behind_picked_door
'Car'
revealed
'Goat #2'
your_prize = options[(options != behind_picked_door) & (options != revealed)][0]
your_prize
'Goat #1'
Let's put all of our work into a single function to make it easier to repeat.
def simulate_switch_strategy():
options = np.array(['Car', 'Goat #1', 'Goat #2'])
behind_picked_door = np.random.choice(options)
if behind_picked_door == 'Goat #1':
revealed = 'Goat #2'
elif behind_picked_door == 'Goat #2':
revealed = 'Goat #1'
else:
revealed = np.random.choice(['Goat #1', 'Goat #2'])
your_prize = options[(options != behind_picked_door) & (options != revealed)][0]
#print(behind_picked_door, 'was behind the door.', revealed, 'was revealed by the host. Your prize was:', your_prize)
return your_prize
Now, every time we call simulate_switch_strategy
, the result is your prize.
simulate_switch_strategy()
'Car'
We should save your prize in each game; to do so, we'll use np.append
.
repetitions = 10000
your_prizes = np.array([])
for i in np.arange(repetitions):
your_prize = simulate_switch_strategy()
your_prizes = np.append(your_prizes, your_prize)
your_prizes
array(['Car', 'Car', 'Goat #2', ..., 'Car', 'Goat #1', 'Car'], dtype='<U32')
your_prizes
array(['Car', 'Car', 'Goat #2', ..., 'Car', 'Goat #1', 'Car'], dtype='<U32')
np.count_nonzero(your_prizes == 'Car')
6629
np.count_nonzero(your_prizes == 'Car') / repetitions
0.6629
This is quite close to the true probability of winning if you switch, $\frac{2}{3}$.
car_count
to 0, and add 1 to it each time your prize is a car.car_count = 0
for i in np.arange(repetitions):
your_prize = simulate_switch_strategy()
if your_prize == 'Car':
car_count = car_count + 1
car_count / repetitions
0.6642
No arrays needed! This strategy won't always work; it depends on the goal of the simulation.
In this case, your prize is always the same as what was behind the picked door.
car_count = 0
for i in np.arange(repetitions):
options = np.array(['Car', 'Goat #1', 'Goat #2'])
behind_picked_door = np.random.choice(options)
your_prize = behind_picked_door
if your_prize == 'Car':
car_count = car_count + 1
car_count / repetitions
0.3388
To estimate the probability of an event through simulation:
for
-loop, and save the results in an array with np.append
.np.count_nonzero
.On Friday, we'll look at past exam problems from practice.dsc10.com to help you prepare for Monday's midterm!