'woof'
'woof'
type('woof')
str
"woof"
'woof'
# A string, not an int!
"1998"
'1998'
When using the +
symbol between two strings, the operation is called "concatenation".
s1 = 'baby'
s2 = '🐼'
s1 + s2
'baby🐼'
s1 + ' ' + s2
'baby 🐼'
s2 * 3
'🐼🐼🐼'
.
after the string (dot notation). upper
method on string s
, we write s.upper()
.upper
, title
, and replace
.my_cool_string = 'data science is super cool!'
my_cool_string.title()
'Data Science Is Super Cool!'
my_cool_string.upper()
'DATA SCIENCE IS SUPER COOL!'
my_cool_string.replace('super cool', '💯' * 3)
'data science is 💯💯💯!'
# len is not a method, since it doesn't use dot notation
len(my_cool_string)
27
Single quotes and double quotes are usually interchangeable, except when the string itself contains a single or double quote.
'my string's full of apostrophes!'
File "/var/folders/pd/w73mdrsj2836_7gp0brr2q7r0000gn/T/ipykernel_6391/3472332101.py", line 1 'my string's full of apostrophes!' ^ SyntaxError: invalid syntax
"my string's full of apostrophes!"
"my string's full of apostrophes!"
# escape the apostrophe with a backslash!
'my string\'s "full" of apostrophes!'
'my string\'s "full" of apostrophes!'
print('my string\'s "full" of apostrophes!')
my string's "full" of apostrophes!
print
¶print
function displays the value in human readable text when it's evaluated.12 # 12 won't be displayed, since Python only shows the value of the last expression
23
23
# Note, there is no Out[number] to the left! That only appears when displaying a non-printed value.
# But both 12 and 23 are displayed.
print(12)
print(23)
12 23
# '\n' inserts a new line
my_newline_str = 'here is a string with two lines.\nhere is the second line'
my_newline_str
'here is a string with two lines.\nhere is the second line'
# The quotes disappeared and the newline is rendered!
print(my_newline_str)
here is a string with two lines. here is the second line
str
.int
and float
.str(3)
'3'
float('3')
3.0
int('4')
4
int('baby panda')
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) /var/folders/pd/w73mdrsj2836_7gp0brr2q7r0000gn/T/ipykernel_6391/455936715.py in <module> ----> 1 int('baby panda') ValueError: invalid literal for int() with base 10: 'baby panda'
Assume you have run the following statements:
x = 3
y = '4'
z = '5.6'
Choose the expression that will be evaluated without an error.
A. x + y
B. x + int(y + z)
C. str(x) + int(y)
D. str(x) + z
E. All of them have errors
How would we store the temperatures for each of the first 6 days in the month of September?
Our best solution right now is to create a separate variable for each day.
temperature_on_sept_01 = 84
temperature_on_sept_02 = 78
temperature_on_sept_03 = 81
temperature_on_sept_04 = 75
temperature_on_sept_05 = 79
temperature_on_sept_06 = 75
This technically allows us to do things like compute the average temperature through the first 6 days:
avg_temperature = 1/6 * (
temperature_on_sept_01
+ temperature_on_sept_02
+ temperature_on_sept_03
+ ...)
Imagine a whole month's data, or a whole year's data. It seems like we need a better solution.
In Python, a list is used to store multiple values in a single value/variable. To create a new list from scratch, we use [
square brackets]
.
temperature_list = [84, 78, 81, 75, 79, 75]
len(temperature_list)
6
Notice that the elements in a list don't need to be unique!
To find the average temperature, we just need to divide the sum of the temperatures by the number of temperatures recorded:
temperature_list
[84, 78, 81, 75, 79, 75]
sum(temperature_list) / len(temperature_list)
78.66666666666667
The type
of a list is... list
.
temperature_list
[84, 78, 81, 75, 79, 75]
type(temperature_list)
list
Within a list, you can store elements of different types.
mixed_list = [-2, 2.5, 'ucsd', [1, 3]]
mixed_list
[-2, 2.5, 'ucsd', [1, 3]]
NumPy (pronounced "num pie") is a Python library (module) that provides support for arrays and operations on them.
The babypandas
library, which you will learn about next week, goes hand-in-hand with NumPy.
To use numpy
, we need to import it. It's usually imported as np
(but doesn't have to be!)
import numpy as np
Think of NumPy arrays (just "arrays" from now on) as fancy, faster lists.
To create an array, we pass a list as input to the np.array
function.
np.array([4, 9, 1, 2])
array([4, 9, 1, 2])
temperature_array = np.array([84, 78, 81, 75, 79, 75])
temperature_array
array([84, 78, 81, 75, 79, 75])
temperature_list
[84, 78, 81, 75, 79, 75]
# No square brackets, because temperature_list is already a list!
np.array(temperature_list)
array([84, 78, 81, 75, 79, 75])
When people stand in a line, each person has a position.
Similarly, each element of an array (and list) has a position.
arr_name
at position pos
, we use the syntax arr_name[pos]
.temperature_array
array([84, 78, 81, 75, 79, 75])
temperature_array[0]
84
temperature_array[1]
78
temperature_array[3]
75
# Access last element
temperature_array[5]
75
temperature_array[6]
--------------------------------------------------------------------------- IndexError Traceback (most recent call last) /var/folders/pd/w73mdrsj2836_7gp0brr2q7r0000gn/T/ipykernel_6391/3393100043.py in <module> ----> 1 temperature_array[6] IndexError: index 6 is out of bounds for axis 0 with size 6
# If a position is negative, count from the end!
temperature_array[-1]
75
Earlier in the lecture, we saw that lists can store elements of multiple types.
nums_and_strings_lst = ['uc', 'sd', 1961, 3.14]
nums_and_strings_lst
['uc', 'sd', 1961, 3.14]
This is not true of arrays – all elements in an array must be of the same type.
# All elements are converted to strings!
np.array(nums_and_strings_lst)
array(['uc', 'sd', '1961', '3.14'], dtype='<U32')
Arrays make it easy to perform the same operation to every element. This behavior is formally known as "broadcasting".
temperature_array
array([84, 78, 81, 75, 79, 75])
# Increase all temperatures by 3 degrees
temperature_array + 3
array([87, 81, 84, 78, 82, 78])
# Halve all temperatures
temperature_array / 2
array([42. , 39. , 40.5, 37.5, 39.5, 37.5])
# Convert all temperatures to Celsius
(5 / 9) * (temperature_array - 32)
array([28.88888889, 25.55555556, 27.22222222, 23.88888889, 26.11111111, 23.88888889])
Note: In none of the above cells did we actually modify temperature_array
! Each of those expressions created a new array.
temperature_array
array([84, 78, 81, 75, 79, 75])
To actually change temperature_array
, we need to reassign it to a new array.
temperature_array = (5 / 9) * (temperature_array - 32)
# Now in Celsius!
temperature_array
array([28.88888889, 25.55555556, 27.22222222, 23.88888889, 26.11111111, 23.88888889])
a + b
is an array whose first element is the sum of the first element of a
and first element of b
.a = np.array([1, 2, 3])
b = np.array([-4, 5, 9])
a + b
array([-3, 7, 12])
a / b
array([-0.25 , 0.4 , 0.33333333])
a ** 2 + b ** 2
array([17, 29, 90])
Baby Panda made a series five TikTok videos called "A Day In the Life of a Data Science Mascot". The number of views they've received on these videos are stored in the array views
below.
views = np.array([158, 352, 195, 1423916, 46])
Some questions:
What was their average view count?
views
array([ 158, 352, 195, 1423916, 46])
sum(views) / len(views)
284933.4
# The mean method exists for arrays (but not for lists)
views.mean()
284933.4
How many views did their most and least popular videos receive?
views
array([ 158, 352, 195, 1423916, 46])
views.max()
1423916
views.min()
46
How many views above average did each of their videos receive? How many views above average did their most viewed video receive?
views
array([ 158, 352, 195, 1423916, 46])
views - views.mean()
array([-284775.4, -284581.4, -284738.4, 1138982.6, -284887.4])
(views - views.mean()).max()
1138982.6
It has been estimated that TikTok pays their creators \$0.03 per 1000 views. If this is true, how many dollars did Baby Panda earn on their most viewed video?
views
array([ 158, 352, 195, 1423916, 46])
views.max() * 0.03 / 1000
42.717479999999995
We often find ourselves needing to make arrays like this:
days_in_september = np.array([
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
23, 24, 25, 26, 27, 28, 29, 30
])
There needs to be an easier way to do this!
np.arange
.np.arange(start, end, step)
. This returns an array such that:start
. By default, start
is 0.step
, until (but excluding) end
. By default, step
is 1.# Start at 0, end before 8, step by 1
# This will be our most common use-case!
np.arange(8)
array([0, 1, 2, 3, 4, 5, 6, 7])
# Start at 5, end before 10, step by 1
np.arange(5, 10)
array([5, 6, 7, 8, 9])
# Start at 3, end before 32, step by 5
np.arange(3, 32, 5)
array([ 3, 8, 13, 18, 23, 28])
# Steps can be fractional!
np.arange(-3, 2, 0.5)
array([-3. , -2.5, -2. , -1.5, -1. , -0.5, 0. , 0.5, 1. , 1.5])
# If step is negative, we count backwards.
np.arange(1, -10, -3)
array([ 1, -2, -5, -8])
🎉 Congrats! 🎉 You won the lottery 💰. Here's how your payout works: on the first day of September, you are paid \$0.01. Every day thereafter, your pay doubles, so on the second day you're paid \\$0.02, on the third day you're paid \$0.04, on the fourth day you're paid \\$0.08, and so on.
September has 30 days.
Write a one-line expression that uses the numbers 2
and 30
, along with the function np.arange
and the method .sum()
, that computes the total amount in dollars you will be paid in September.
...
Ellipsis
We'll learn about how to use Python to work with real-world tabular data.