int
and float
¶type
function to check a value's type.int
: An integer of any size.float
: A number with a decimal point.# int.
6 + 4
10
# float.
20 / 2
10.0
int
¶+
), subtract (-
), multiply (*
), or exponentiate (**
) int
s, the result will be another int
.int
s have arbitrary precision in Python, meaning that your calculations will always be exact. 7 - 15
-8
type(7 - 15)
int
2 ** 300
2037035976334486086268445688409378161051468393665936250636140449354381299763336706183397376
2 ** 3000
1230231922161117176931558813276752514640713895736833715766118029160058800614672948775360067838593459582429649254051804908512884180898236823585082482065348331234959350355845017413023320111360666922624728239756880416434478315693675013413090757208690376793296658810662941824493488451726505303712916005346747908623702673480919353936813105736620402352744776903840477883651100322409301983488363802930540482487909763484098253940728685132044408863734754271212592471778643949486688511721051561970432780747454823776808464180697103083861812184348565522740195796682622205511845512080552010310050255801589349645928001133745474220715013683413907542779063759833876101354235184245096670042160720629411581502371248008430447184842098610320580417992206662247328722122088513643683907670360209162653670641130936997002170500675501374723998766005827579300723253474890612250135171889174899079911291512399773872178519018229989376
float
¶float
is specified using a decimal point.float
might be printed using scientific notation.3.2 + 2.5
5.7
type(3.2 + 2.5)
float
# The result is in scientific notation: e+90 means "times 10^90".
2.0 ** 300
2.037035976334486e+90
float
¶floats
have limited precision; after arithmetic, the final few decimal places can be wrong in unexpected ways.float
s have limited size, though the limit is huge.1 + 0.2
1.2
1 + 0.1 + 0.1
1.2000000000000002
2.0 ** 3000
--------------------------------------------------------------------------- OverflowError Traceback (most recent call last) /var/folders/28/vs8cp38n1r1520g8bhzr4v5h0000gn/T/ipykernel_6609/1310821553.py in <module> ----> 1 2.0 ** 3000 OverflowError: (34, 'Result too large')
int
and float
¶int
s and float
s in an expression, the result will always be a float
.int
s, you get a float
back.int
and float
functions.2.0 + 3
5.0
12 / 2
6.0
# Want an integer back.
int(12 / 2)
6
# int chops off the decimal point!
int(-2.9)
-2
'woof'
'woof'
type('woof')
str
"woof"
'woof'
# A string, not an int!
"1998"
'1998'
When using the +
symbol between two strings, the operation is called "concatenation".
s1 = 'baby'
s2 = '🐼'
s1 + s2
'baby🐼'
s1 + ' ' + s2
'baby 🐼'
s2 * 3
'🐼🐼🐼'
my_cool_string = 'data science is super cool!'
my_cool_string.title()
'Data Science Is Super Cool!'
my_cool_string.upper()
'DATA SCIENCE IS SUPER COOL!'
my_cool_string.replace('super cool', '💯' * 3)
'data science is 💯💯💯!'
# len is not a method, since it doesn't use dot notation.
len(my_cool_string)
27
str
.int
and float
.str(3)
'3'
float('3')
3.0
int('4')
4
int('baby panda')
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) /var/folders/28/vs8cp38n1r1520g8bhzr4v5h0000gn/T/ipykernel_6609/455936715.py in <module> ----> 1 int('baby panda') ValueError: invalid literal for int() with base 10: 'baby panda'
int('4.3')
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) /var/folders/28/vs8cp38n1r1520g8bhzr4v5h0000gn/T/ipykernel_6609/756068685.py in <module> ----> 1 int('4.3') ValueError: invalid literal for int() with base 10: '4.3'
Assume you have run the following statements:
x = 3
y = '4'
z = '5.6'
Choose the expression that will be evaluated without an error.
A. x + y
B. x + int(y + z)
C. str(x) + int(y)
D. str(x) + z
E. All of them have errors
int
s or float
s) and pieces of text (as strings). But we often we'll work with sequences, or ordered collections, of several data values. The mean is a one-number summary of a collection of numbers.
For example, the mean of $1$, $4$, $7$, and $12$ is $\frac{1 + 4 + 7 + 12}{4} = 6$.
Observe that the mean:
Like the mean, the median is a one-number summary of a collection of numbers.
Find two different datasets that have the same mean and different medians.
Find two different datasets that have the same median and different means.
Find two different datasets that have the same median and the same mean.
Means and medians are just summaries; they don't tell the whole story about a dataset!
In a few weeks, we'll learn about how to visualize the distribution of a collection of numbers using a histogram.
These two distributions have different means but the same median!
How would we store the temperatures for a week to compute the average temperature?
Our best solution right now is to create a separate variable for each day of the week.
temp_sunday = 68
temp_monday = 73
temp_tuesday = 70
temp_wednesday = 74
temp_thursday = 76
temp_friday = 72
temp_saturday = 74
This technically allows us to do things like compute the average temperature:
avg_temperature = 1/7 * (
temp_sunday
+ temp_monday
+ temp_tuesday
+ ...)
Imagine a whole month's data, or a whole year's data. It seems like we need a better solution.
In Python, a list is used to store multiple values within a single value. To create a new list from scratch, we use [
square brackets]
.
temperature_list = [68, 73, 70, 74, 76, 72, 74]
len(temperature_list)
7
Notice that the elements in a list don't need to be unique!
To find the average temperature, we just need to divide the sum of the temperatures by the number of temperatures recorded:
temperature_list
[68, 73, 70, 74, 76, 72, 74]
sum(temperature_list) / len(temperature_list)
72.42857142857143
The type
of a list is... list
.
temperature_list
[68, 73, 70, 74, 76, 72, 74]
type(temperature_list)
list
Within a list, you can store elements of different types.
mixed_list = [-2, 2.5, 'ucsd', [1, 3]]
mixed_list
[-2, 2.5, 'ucsd', [1, 3]]
NumPy (pronounced "num pie") is a Python library (module) that provides support for arrays and operations on them.
The babypandas
library, which you will learn about soon, goes hand-in-hand with NumPy.
To use numpy
, we need to import it. It's usually imported as np
(but doesn't have to be!)
import numpy as np
Think of NumPy arrays (just "arrays" from now on) as fancy, faster lists.
To create an array, we pass a list as input to the np.array
function.
np.array([4, 9, 1, 2])
array([4, 9, 1, 2])
temperature_array = np.array([68, 73, 70, 74, 76, 72, 74])
temperature_array
array([68, 73, 70, 74, 76, 72, 74])
temperature_list
[68, 73, 70, 74, 76, 72, 74]
# No square brackets, because temperature_list is already a list!
np.array(temperature_list)
array([68, 73, 70, 74, 76, 72, 74])
When people wait in line, each person has a position.
Similarly, each element of an array (and list) has a position.
arr_name
at position pos
, we use the syntax arr_name[pos]
.temperature_array
array([68, 73, 70, 74, 76, 72, 74])
temperature_array[0]
68
temperature_array[1]
73
temperature_array[3]
74
# Access the last element.
temperature_array[6]
74
# Doesn't work!
temperature_array[7]
--------------------------------------------------------------------------- IndexError Traceback (most recent call last) /var/folders/28/vs8cp38n1r1520g8bhzr4v5h0000gn/T/ipykernel_6609/3291506737.py in <module> 1 # Doesn't work! ----> 2 temperature_array[7] IndexError: index 7 is out of bounds for axis 0 with size 7
# If a position is negative, count from the end!
temperature_array[-1]
74
Earlier in the lecture, we saw that lists can store elements of multiple types.
nums_and_strings_lst = ['uc', 'sd', 1961, 3.14]
nums_and_strings_lst
['uc', 'sd', 1961, 3.14]
This is not true of arrays – all elements in an array must be of the same type.
# All elements are converted to strings!
np.array(nums_and_strings_lst)
array(['uc', 'sd', '1961', '3.14'], dtype='<U32')
Arrays make it easy to perform the same operation to every element. This behavior is formally known as "broadcasting".
temperature_array
array([68, 73, 70, 74, 76, 72, 74])
# Increase all temperatures by 3 degrees.
temperature_array + 3
array([71, 76, 73, 77, 79, 75, 77])
# Halve all temperatures.
temperature_array / 2
array([34. , 36.5, 35. , 37. , 38. , 36. , 37. ])
# Convert all temperatures to Celsius.
(5 / 9) * (temperature_array - 32)
array([20. , 22.77777778, 21.11111111, 23.33333333, 24.44444444, 22.22222222, 23.33333333])
Note: In none of the above cells did we actually modify temperature_array
! Each of those expressions created a new array.
temperature_array
array([68, 73, 70, 74, 76, 72, 74])
To actually change temperature_array
, we need to reassign it to a new array.
temperature_array = (5 / 9) * (temperature_array - 32)
# Now in Celsius!
temperature_array
array([20. , 22.77777778, 21.11111111, 23.33333333, 24.44444444, 22.22222222, 23.33333333])
a = np.array([4, 5, -1])
b = np.array([2, 3, 2])
a + b
array([6, 8, 1])
a / b
array([ 2. , 1.66666667, -0.5 ])
a ** 2 + b ** 2
array([20, 34, 5])
We'll learn more about arrays and we'll see how to use Python to work with real-world tabular data.