# Run this cell if you're following along – it just helps make the lectures appear prettier.
import pandas as pd
import numpy as np
np.set_printoptions(threshold=20, precision=2, suppress=True)
pd.set_option("display.max_rows", 7)
pd.set_option("display.max_columns", 8)
pd.set_option("display.precision", 2)
Today, we'll use a real dataset and lots of motivating questions to illustrate key DataFrame manipulation techniques.
pandas
¶pandas
is a Python package that allows us to work with tabular data – that is, data in the form of a table that we might otherwise work with as a spreadsheet (in Excel or Google Sheets).pandas
is the tool for doing data science in Python.pandas
is not so cute...¶babypandas
!¶pandas
called babypandas
.pandas
code. You are learning pandas
!babypandas
🐼¶babypandas
(and pandas
) are called "DataFrames."babypandas
. (We'll need numpy
as well.)import babypandas as bpd
import numpy as np
data/get-it-done-apr-08.csv
contains service requests made on April 8th, 2023 through the Get It Done program. We can read in a CSV using bpd.read_csv(...)
. Give it the path to a file relative to your notebook; if the file is in the same folder as your notebook, this is just the name of the file.
apr_08 = bpd.read_csv('data/get-it-done-apr-08.csv')
apr_08
service_request_id | date_requested | neighborhood | service | status | street_address | public_description | |
---|---|---|---|---|---|---|---|
0 | 4183116 | 2023-04-08T00:32:00 | Downtown | Traffic Signal Issue | open | 2ND AVE & G ST | Signal, 2nd & G |
1 | 4183117 | 2023-04-08T00:44:00 | Mid-City:Eastern Area | Missed Collection | open | 4791 Seminole Dr, San Diego, CA 92115, USA | My Blue Recycle Bin was not collected. I'm th... |
2 | 4183118 | 2023-04-08T00:49:00 | Navajo | Parking | open | 4728 Allied Rd, San Diego, CA 92120, USA | White van parked across my driveway. I will n... |
3 | 4183119 | 2023-04-08T01:04:00 | Encanto Neighborhoods | Missed Collection | open | 6066 Tempas Ct | Green container |
4 | 4183120 | 2023-04-08T01:10:00 | Navajo | Pothole | open | 7961?7979 Topaz Lake Ave | Potholes || LOCATION: Topaz Lake between Pear... |
... | ... | ... | ... | ... | ... | ... | ... |
1084 | 4184220 | 2023-04-08T19:53:00 | Barrio Logan | Other | closed | 3718 Dalbergia St | Prostitution |
1085 | 4184221 | 2023-04-08T19:53:00 | Barrio Logan | Other | closed | 3743 Dalbergia St | Prostitution |
1086 | 4184223 | 2023-04-08T19:54:00 | Barrio Logan | Other | closed | 2120 Woden St | Prostitution |
1087 | 4184225 | 2023-04-08T19:54:00 | Barrio Logan | Other | closed | 2005 Vesta St | Prostitution |
1088 | 4184294 | 2023-04-08T23:36:00 | Southeastern San Diego | Other | closed | 3762 Cottonwood St | Loud music |
1089 rows × 7 columns
'neighborhood'
and 'status'
.# This DataFrame has 1089 rows and 7 columns.
apr_08
service_request_id | date_requested | neighborhood | service | status | street_address | public_description | |
---|---|---|---|---|---|---|---|
0 | 4183116 | 2023-04-08T00:32:00 | Downtown | Traffic Signal Issue | open | 2ND AVE & G ST | Signal, 2nd & G |
1 | 4183117 | 2023-04-08T00:44:00 | Mid-City:Eastern Area | Missed Collection | open | 4791 Seminole Dr, San Diego, CA 92115, USA | My Blue Recycle Bin was not collected. I'm th... |
2 | 4183118 | 2023-04-08T00:49:00 | Navajo | Parking | open | 4728 Allied Rd, San Diego, CA 92120, USA | White van parked across my driveway. I will n... |
3 | 4183119 | 2023-04-08T01:04:00 | Encanto Neighborhoods | Missed Collection | open | 6066 Tempas Ct | Green container |
4 | 4183120 | 2023-04-08T01:10:00 | Navajo | Pothole | open | 7961?7979 Topaz Lake Ave | Potholes || LOCATION: Topaz Lake between Pear... |
... | ... | ... | ... | ... | ... | ... | ... |
1084 | 4184220 | 2023-04-08T19:53:00 | Barrio Logan | Other | closed | 3718 Dalbergia St | Prostitution |
1085 | 4184221 | 2023-04-08T19:53:00 | Barrio Logan | Other | closed | 3743 Dalbergia St | Prostitution |
1086 | 4184223 | 2023-04-08T19:54:00 | Barrio Logan | Other | closed | 2120 Woden St | Prostitution |
1087 | 4184225 | 2023-04-08T19:54:00 | Barrio Logan | Other | closed | 2005 Vesta St | Prostitution |
1088 | 4184294 | 2023-04-08T23:36:00 | Southeastern San Diego | Other | closed | 3762 Cottonwood St | Loud music |
1089 rows × 7 columns
.set_index(column_name)
.apr_08.set_index('service_request_id')
date_requested | neighborhood | service | status | street_address | public_description | |
---|---|---|---|---|---|---|
service_request_id | ||||||
4183116 | 2023-04-08T00:32:00 | Downtown | Traffic Signal Issue | open | 2ND AVE & G ST | Signal, 2nd & G |
4183117 | 2023-04-08T00:44:00 | Mid-City:Eastern Area | Missed Collection | open | 4791 Seminole Dr, San Diego, CA 92115, USA | My Blue Recycle Bin was not collected. I'm th... |
4183118 | 2023-04-08T00:49:00 | Navajo | Parking | open | 4728 Allied Rd, San Diego, CA 92120, USA | White van parked across my driveway. I will n... |
4183119 | 2023-04-08T01:04:00 | Encanto Neighborhoods | Missed Collection | open | 6066 Tempas Ct | Green container |
4183120 | 2023-04-08T01:10:00 | Navajo | Pothole | open | 7961?7979 Topaz Lake Ave | Potholes || LOCATION: Topaz Lake between Pear... |
... | ... | ... | ... | ... | ... | ... |
4184220 | 2023-04-08T19:53:00 | Barrio Logan | Other | closed | 3718 Dalbergia St | Prostitution |
4184221 | 2023-04-08T19:53:00 | Barrio Logan | Other | closed | 3743 Dalbergia St | Prostitution |
4184223 | 2023-04-08T19:54:00 | Barrio Logan | Other | closed | 2120 Woden St | Prostitution |
4184225 | 2023-04-08T19:54:00 | Barrio Logan | Other | closed | 2005 Vesta St | Prostitution |
4184294 | 2023-04-08T23:36:00 | Southeastern San Diego | Other | closed | 3762 Cottonwood St | Loud music |
1089 rows × 6 columns
🚨 Like most DataFrame methods, .set_index
returns a new DataFrame; it does not modify the original DataFrame.
apr_08
service_request_id | date_requested | neighborhood | service | status | street_address | public_description | |
---|---|---|---|---|---|---|---|
0 | 4183116 | 2023-04-08T00:32:00 | Downtown | Traffic Signal Issue | open | 2ND AVE & G ST | Signal, 2nd & G |
1 | 4183117 | 2023-04-08T00:44:00 | Mid-City:Eastern Area | Missed Collection | open | 4791 Seminole Dr, San Diego, CA 92115, USA | My Blue Recycle Bin was not collected. I'm th... |
2 | 4183118 | 2023-04-08T00:49:00 | Navajo | Parking | open | 4728 Allied Rd, San Diego, CA 92120, USA | White van parked across my driveway. I will n... |
3 | 4183119 | 2023-04-08T01:04:00 | Encanto Neighborhoods | Missed Collection | open | 6066 Tempas Ct | Green container |
4 | 4183120 | 2023-04-08T01:10:00 | Navajo | Pothole | open | 7961?7979 Topaz Lake Ave | Potholes || LOCATION: Topaz Lake between Pear... |
... | ... | ... | ... | ... | ... | ... | ... |
1084 | 4184220 | 2023-04-08T19:53:00 | Barrio Logan | Other | closed | 3718 Dalbergia St | Prostitution |
1085 | 4184221 | 2023-04-08T19:53:00 | Barrio Logan | Other | closed | 3743 Dalbergia St | Prostitution |
1086 | 4184223 | 2023-04-08T19:54:00 | Barrio Logan | Other | closed | 2120 Woden St | Prostitution |
1087 | 4184225 | 2023-04-08T19:54:00 | Barrio Logan | Other | closed | 2005 Vesta St | Prostitution |
1088 | 4184294 | 2023-04-08T23:36:00 | Southeastern San Diego | Other | closed | 3762 Cottonwood St | Loud music |
1089 rows × 7 columns
apr_08 = apr_08.set_index('service_request_id')
apr_08
date_requested | neighborhood | service | status | street_address | public_description | |
---|---|---|---|---|---|---|
service_request_id | ||||||
4183116 | 2023-04-08T00:32:00 | Downtown | Traffic Signal Issue | open | 2ND AVE & G ST | Signal, 2nd & G |
4183117 | 2023-04-08T00:44:00 | Mid-City:Eastern Area | Missed Collection | open | 4791 Seminole Dr, San Diego, CA 92115, USA | My Blue Recycle Bin was not collected. I'm th... |
4183118 | 2023-04-08T00:49:00 | Navajo | Parking | open | 4728 Allied Rd, San Diego, CA 92120, USA | White van parked across my driveway. I will n... |
4183119 | 2023-04-08T01:04:00 | Encanto Neighborhoods | Missed Collection | open | 6066 Tempas Ct | Green container |
4183120 | 2023-04-08T01:10:00 | Navajo | Pothole | open | 7961?7979 Topaz Lake Ave | Potholes || LOCATION: Topaz Lake between Pear... |
... | ... | ... | ... | ... | ... | ... |
4184220 | 2023-04-08T19:53:00 | Barrio Logan | Other | closed | 3718 Dalbergia St | Prostitution |
4184221 | 2023-04-08T19:53:00 | Barrio Logan | Other | closed | 3743 Dalbergia St | Prostitution |
4184223 | 2023-04-08T19:54:00 | Barrio Logan | Other | closed | 2120 Woden St | Prostitution |
4184225 | 2023-04-08T19:54:00 | Barrio Logan | Other | closed | 2005 Vesta St | Prostitution |
4184294 | 2023-04-08T23:36:00 | Southeastern San Diego | Other | closed | 3762 Cottonwood St | Loud music |
1089 rows × 6 columns
.shape
returns the number of rows and columns in a given DataFrame.[]
: .shape[0]
for rows..shape[1]
for columns.# There were 7 columns before, but one of them became the index, and the index is not a column!
apr_08.shape
(1089, 6)
# Number of rows.
apr_08.shape[0]
1089
# Number of columns.
apr_08.shape[1]
6
data/get-it-done-requests.csv
contains a summary of all Get It Done requests submitted so far in 2023.'closed'
) versus unresolved ('open'
).requests = bpd.read_csv('data/get-it-done-requests.csv')
requests
neighborhood | service | closed | open | |
---|---|---|---|---|
0 | Balboa Park | Dead Animal | 11 | 0 |
1 | Balboa Park | Development Services - Code Enforcement | 2 | 0 |
2 | Balboa Park | Encampment | 215 | 20 |
3 | Balboa Park | Environmental Services Code Compliance | 8 | 1 |
4 | Balboa Park | Graffiti - Code Enforcement | 2 | 1 |
... | ... | ... | ... | ... |
1416 | Via De La Valle | Encampment | 0 | 1 |
1417 | Via De La Valle | Pavement Maintenance | 0 | 3 |
1418 | Via De La Valle | Pothole | 11 | 7 |
1419 | Via De La Valle | Sidewalk Repair Issue | 0 | 1 |
1420 | Via De La Valle | Street Sweeping | 1 | 0 |
1421 rows × 4 columns
Key concepts: Accessing columns, performing operations with them, and adding new columns.
Question: How many total service requests of each type in each neighborhood have been made this year?
requests
neighborhood | service | closed | open | |
---|---|---|---|---|
0 | Balboa Park | Dead Animal | 11 | 0 |
1 | Balboa Park | Development Services - Code Enforcement | 2 | 0 |
2 | Balboa Park | Encampment | 215 | 20 |
3 | Balboa Park | Environmental Services Code Compliance | 8 | 1 |
4 | Balboa Park | Graffiti - Code Enforcement | 2 | 1 |
... | ... | ... | ... | ... |
1416 | Via De La Valle | Encampment | 0 | 1 |
1417 | Via De La Valle | Pavement Maintenance | 0 | 3 |
1418 | Via De La Valle | Pothole | 11 | 7 |
1419 | Via De La Valle | Sidewalk Repair Issue | 0 | 1 |
1420 | Via De La Valle | Street Sweeping | 1 | 0 |
1421 rows × 4 columns
.get(column_name)
.requests
neighborhood | service | closed | open | |
---|---|---|---|---|
0 | Balboa Park | Dead Animal | 11 | 0 |
1 | Balboa Park | Development Services - Code Enforcement | 2 | 0 |
2 | Balboa Park | Encampment | 215 | 20 |
3 | Balboa Park | Environmental Services Code Compliance | 8 | 1 |
4 | Balboa Park | Graffiti - Code Enforcement | 2 | 1 |
... | ... | ... | ... | ... |
1416 | Via De La Valle | Encampment | 0 | 1 |
1417 | Via De La Valle | Pavement Maintenance | 0 | 3 |
1418 | Via De La Valle | Pothole | 11 | 7 |
1419 | Via De La Valle | Sidewalk Repair Issue | 0 | 1 |
1420 | Via De La Valle | Street Sweeping | 1 | 0 |
1421 rows × 4 columns
requests.get('closed')
0 11 1 2 2 215 3 8 4 2 ... 1416 0 1417 0 1418 11 1419 0 1420 1 Name: closed, Length: 1421, dtype: int64
requests.get('closed')
0 11 1 2 2 215 3 8 4 2 ... 1416 0 1417 0 1418 11 1419 0 1420 1 Name: closed, Length: 1421, dtype: int64
type(requests.get('closed'))
babypandas.bpd.Series
requests.get('open')
0 0 1 0 2 20 3 1 4 1 .. 1416 1 1417 3 1418 7 1419 1 1420 0 Name: open, Length: 1421, dtype: int64
requests.get('closed') + requests.get('open')
0 11 1 2 2 235 3 9 4 3 ... 1416 1 1417 3 1418 18 1419 1 1420 1 Length: 1421, dtype: int64
.assign(name_of_column=data_in_series)
to assign a Series (or array, or list) to a DataFrame.name_of_column
.requests.assign(
total=requests.get('closed') + requests.get('open')
)
neighborhood | service | closed | open | total | |
---|---|---|---|---|---|
0 | Balboa Park | Dead Animal | 11 | 0 | 11 |
1 | Balboa Park | Development Services - Code Enforcement | 2 | 0 | 2 |
2 | Balboa Park | Encampment | 215 | 20 | 235 |
3 | Balboa Park | Environmental Services Code Compliance | 8 | 1 | 9 |
4 | Balboa Park | Graffiti - Code Enforcement | 2 | 1 | 3 |
... | ... | ... | ... | ... | ... |
1416 | Via De La Valle | Encampment | 0 | 1 | 1 |
1417 | Via De La Valle | Pavement Maintenance | 0 | 3 | 3 |
1418 | Via De La Valle | Pothole | 11 | 7 | 18 |
1419 | Via De La Valle | Sidewalk Repair Issue | 0 | 1 | 1 |
1420 | Via De La Valle | Street Sweeping | 1 | 0 | 1 |
1421 rows × 5 columns
requests
neighborhood | service | closed | open | |
---|---|---|---|---|
0 | Balboa Park | Dead Animal | 11 | 0 |
1 | Balboa Park | Development Services - Code Enforcement | 2 | 0 |
2 | Balboa Park | Encampment | 215 | 20 |
3 | Balboa Park | Environmental Services Code Compliance | 8 | 1 |
4 | Balboa Park | Graffiti - Code Enforcement | 2 | 1 |
... | ... | ... | ... | ... |
1416 | Via De La Valle | Encampment | 0 | 1 |
1417 | Via De La Valle | Pavement Maintenance | 0 | 3 |
1418 | Via De La Valle | Pothole | 11 | 7 |
1419 | Via De La Valle | Sidewalk Repair Issue | 0 | 1 |
1420 | Via De La Valle | Street Sweeping | 1 | 0 |
1421 rows × 4 columns
requests = requests.assign(
total=requests.get('closed') + requests.get('open')
)
requests
neighborhood | service | closed | open | total | |
---|---|---|---|---|---|
0 | Balboa Park | Dead Animal | 11 | 0 | 11 |
1 | Balboa Park | Development Services - Code Enforcement | 2 | 0 | 2 |
2 | Balboa Park | Encampment | 215 | 20 | 235 |
3 | Balboa Park | Environmental Services Code Compliance | 8 | 1 | 9 |
4 | Balboa Park | Graffiti - Code Enforcement | 2 | 1 | 3 |
... | ... | ... | ... | ... | ... |
1416 | Via De La Valle | Encampment | 0 | 1 | 1 |
1417 | Via De La Valle | Pavement Maintenance | 0 | 3 | 3 |
1418 | Via De La Valle | Pothole | 11 | 7 | 18 |
1419 | Via De La Valle | Sidewalk Repair Issue | 0 | 1 | 1 |
1420 | Via De La Valle | Street Sweeping | 1 | 0 | 1 |
1421 rows × 5 columns
Key concept: Computing statistics of columns using Series methods.
Series, like arrays, have helpful methods, including .min()
, .max()
, and .mean()
.
requests.get('total').max()
4970
What is it that people are reporting so frequently, and where? We'll see how to find out shortly!
Other statistics:
requests.get('total').mean()
82.88529204785362
requests.get('total').median()
18.0
requests.get('open').mean()
16.58972554539057
requests.get('open').median()
3.0
# Lots of information at once!
requests.get('total').describe()
count 1421.00 mean 82.89 std 258.37 min 1.00 25% 5.00 50% 18.00 75% 54.00 max 4970.00 Name: total, dtype: float64
Key concepts: Sorting. Accessing using integer positions.
.sort_values(by=column_name)
method to sort.by=
can be omitted, but helps with readability.requests.sort_values(by='total')
neighborhood | service | closed | open | total | |
---|---|---|---|---|---|
1420 | Via De La Valle | Street Sweeping | 1 | 0 | 1 |
1105 | San Ysidro | Pavement Maintenance | 0 | 1 | 1 |
1092 | San Pasqual | Traffic Signal Timing | 1 | 0 | 1 |
1091 | San Pasqual | Traffic Signal Issue | 1 | 0 | 1 |
1090 | San Pasqual | Traffic Sign Maintenance | 1 | 0 | 1 |
... | ... | ... | ... | ... | ... |
1389 | Uptown | Graffiti - Public | 1422 | 384 | 1806 |
147 | Clairemont Mesa | Pothole | 1847 | 164 | 2011 |
1213 | Southeastern San Diego | Illegal Dumping | 2965 | 15 | 2980 |
404 | Mid-City:City Heights | Illegal Dumping | 3997 | 51 | 4048 |
207 | Downtown | Encampment | 4321 | 649 | 4970 |
1421 rows × 5 columns
This sorts, but in ascending order (small to large). The opposite would be nice!
.sort_values(by=column_name, ascending=False)
to sort in descending order.ascending
is an optional argument. If omitted, it will be set to True
by default.ascending=
.ordered_requests = requests.sort_values(by='total', ascending=False)
ordered_requests
neighborhood | service | closed | open | total | |
---|---|---|---|---|---|
207 | Downtown | Encampment | 4321 | 649 | 4970 |
404 | Mid-City:City Heights | Illegal Dumping | 3997 | 51 | 4048 |
1213 | Southeastern San Diego | Illegal Dumping | 2965 | 15 | 2980 |
147 | Clairemont Mesa | Pothole | 1847 | 164 | 2011 |
1389 | Uptown | Graffiti - Public | 1422 | 384 | 1806 |
... | ... | ... | ... | ... | ... |
391 | Los Penasquitos Canyon | Environmental Services Code Compliance | 1 | 0 | 1 |
346 | La Jolla | Right-of-Way Code Enforcement | 0 | 1 | 1 |
274 | Fairbanks Ranch Country Club | Stormwater | 1 | 0 | 1 |
243 | East Elliott | Tree Maintenance | 1 | 0 | 1 |
1420 | Via De La Valle | Street Sweeping | 1 | 0 | 1 |
1421 rows × 5 columns
# We must specify the role of False by using ascending=,
# otherwise Python does not know how to interpret this.
requests.sort_values(by='total', False)
File "/var/folders/ch/hyjw6whx3g9gshnp58738jc80000gp/T/ipykernel_2092/3731718929.py", line 3 requests.sort_values(by='total', False) ^ SyntaxError: positional argument follows keyword argument
'Encampment'
in 'Downtown'
, but how do we extract that information using code?.iloc[integer_position]
.iloc
stands for "integer location."ordered_requests
neighborhood | service | closed | open | total | |
---|---|---|---|---|---|
207 | Downtown | Encampment | 4321 | 649 | 4970 |
404 | Mid-City:City Heights | Illegal Dumping | 3997 | 51 | 4048 |
1213 | Southeastern San Diego | Illegal Dumping | 2965 | 15 | 2980 |
147 | Clairemont Mesa | Pothole | 1847 | 164 | 2011 |
1389 | Uptown | Graffiti - Public | 1422 | 384 | 1806 |
... | ... | ... | ... | ... | ... |
391 | Los Penasquitos Canyon | Environmental Services Code Compliance | 1 | 0 | 1 |
346 | La Jolla | Right-of-Way Code Enforcement | 0 | 1 | 1 |
274 | Fairbanks Ranch Country Club | Stormwater | 1 | 0 | 1 |
243 | East Elliott | Tree Maintenance | 1 | 0 | 1 |
1420 | Via De La Valle | Street Sweeping | 1 | 0 | 1 |
1421 rows × 5 columns
ordered_requests.get('neighborhood')
207 Downtown 404 Mid-City:City Heights 1213 Southeastern San Diego 147 Clairemont Mesa 1389 Uptown ... 391 Los Penasquitos Canyon 346 La Jolla 274 Fairbanks Ranch Country Club 243 East Elliott 1420 Via De La Valle Name: neighborhood, Length: 1421, dtype: object
ordered_requests.get('neighborhood').iloc[0]
'Downtown'
ordered_requests.get('service').iloc[0]
'Encampment'
Key concept: Accessing using row labels.
On April 8th, you submitted service request 4183848. Has the issue been resolved?
This cannot be answered from the annual summary data, but must be answered from the detailed data about April 8th.
apr_08
date_requested | neighborhood | service | status | street_address | public_description | |
---|---|---|---|---|---|---|
service_request_id | ||||||
4183116 | 2023-04-08T00:32:00 | Downtown | Traffic Signal Issue | open | 2ND AVE & G ST | Signal, 2nd & G |
4183117 | 2023-04-08T00:44:00 | Mid-City:Eastern Area | Missed Collection | open | 4791 Seminole Dr, San Diego, CA 92115, USA | My Blue Recycle Bin was not collected. I'm th... |
4183118 | 2023-04-08T00:49:00 | Navajo | Parking | open | 4728 Allied Rd, San Diego, CA 92120, USA | White van parked across my driveway. I will n... |
4183119 | 2023-04-08T01:04:00 | Encanto Neighborhoods | Missed Collection | open | 6066 Tempas Ct | Green container |
4183120 | 2023-04-08T01:10:00 | Navajo | Pothole | open | 7961?7979 Topaz Lake Ave | Potholes || LOCATION: Topaz Lake between Pear... |
... | ... | ... | ... | ... | ... | ... |
4184220 | 2023-04-08T19:53:00 | Barrio Logan | Other | closed | 3718 Dalbergia St | Prostitution |
4184221 | 2023-04-08T19:53:00 | Barrio Logan | Other | closed | 3743 Dalbergia St | Prostitution |
4184223 | 2023-04-08T19:54:00 | Barrio Logan | Other | closed | 2120 Woden St | Prostitution |
4184225 | 2023-04-08T19:54:00 | Barrio Logan | Other | closed | 2005 Vesta St | Prostitution |
4184294 | 2023-04-08T23:36:00 | Southeastern San Diego | Other | closed | 3762 Cottonwood St | Loud music |
1089 rows × 6 columns
Your service request is buried in the middle of the DataFrame. Only the first few rows and last few rows are shown, so you can't tell just by looking at the DataFrame.
To pull out one particular entry of a DataFrame corresponding to a row and column with certain labels:
.get(column_name)
to extract the entire column as a Series..loc[]
to access the element of a Series with a particular row label.In this class, we'll always first access a column, then a row (but row, then column is also possible).
apr_08.get('status')
service_request_id 4183116 open 4183117 open 4183118 open 4183119 open 4183120 open ... 4184220 closed 4184221 closed 4184223 closed 4184225 closed 4184294 closed Name: status, Length: 1089, dtype: object
apr_08.get('status').loc[4183848]
'open'
Oh no, your service request 4183848 has still not been resolved! What was the problem again?
Write one line of code that evaluates to the full description of the problem, as you described it in your service request.
...
Ellipsis
.loc[]
uses the row label..iloc[]
uses the integer position..loc[]
is more convenient, but each is best for different scenarios.bpd.read_csv
.bpd.read_csv('data/get-it-done-apr-08.csv')
service_request_id | date_requested | neighborhood | service | status | street_address | public_description | |
---|---|---|---|---|---|---|---|
0 | 4183116 | 2023-04-08T00:32:00 | Downtown | Traffic Signal Issue | open | 2ND AVE & G ST | Signal, 2nd & G |
1 | 4183117 | 2023-04-08T00:44:00 | Mid-City:Eastern Area | Missed Collection | open | 4791 Seminole Dr, San Diego, CA 92115, USA | My Blue Recycle Bin was not collected. I'm th... |
2 | 4183118 | 2023-04-08T00:49:00 | Navajo | Parking | open | 4728 Allied Rd, San Diego, CA 92120, USA | White van parked across my driveway. I will n... |
3 | 4183119 | 2023-04-08T01:04:00 | Encanto Neighborhoods | Missed Collection | open | 6066 Tempas Ct | Green container |
4 | 4183120 | 2023-04-08T01:10:00 | Navajo | Pothole | open | 7961?7979 Topaz Lake Ave | Potholes || LOCATION: Topaz Lake between Pear... |
... | ... | ... | ... | ... | ... | ... | ... |
1084 | 4184220 | 2023-04-08T19:53:00 | Barrio Logan | Other | closed | 3718 Dalbergia St | Prostitution |
1085 | 4184221 | 2023-04-08T19:53:00 | Barrio Logan | Other | closed | 3743 Dalbergia St | Prostitution |
1086 | 4184223 | 2023-04-08T19:54:00 | Barrio Logan | Other | closed | 2120 Woden St | Prostitution |
1087 | 4184225 | 2023-04-08T19:54:00 | Barrio Logan | Other | closed | 2005 Vesta St | Prostitution |
1088 | 4184294 | 2023-04-08T23:36:00 | Southeastern San Diego | Other | closed | 3762 Cottonwood St | Loud music |
1089 rows × 7 columns
bpd.read_csv('data/get-it-done-apr-08.csv').get('public_description').loc[561]
'Abandoned van'
bpd.read_csv('data/get-it-done-apr-08.csv').get('public_description').iloc[561]
'Abandoned van'
requests.get('open').max()
.apr_08.shape[0]
.apr_08.sort_values(by='date_requested', ascending=False).get('public_description').iloc[0]
.Moving forward, let's just focus on the requests
DataFrame. As a reminder, here's what it looks like:
requests
neighborhood | service | closed | open | total | |
---|---|---|---|---|---|
0 | Balboa Park | Dead Animal | 11 | 0 | 11 |
1 | Balboa Park | Development Services - Code Enforcement | 2 | 0 | 2 |
2 | Balboa Park | Encampment | 215 | 20 | 235 |
3 | Balboa Park | Environmental Services Code Compliance | 8 | 1 | 9 |
4 | Balboa Park | Graffiti - Code Enforcement | 2 | 1 | 3 |
... | ... | ... | ... | ... | ... |
1416 | Via De La Valle | Encampment | 0 | 1 | 1 |
1417 | Via De La Valle | Pavement Maintenance | 0 | 3 | 3 |
1418 | Via De La Valle | Pothole | 11 | 7 | 18 |
1419 | Via De La Valle | Sidewalk Repair Issue | 0 | 1 | 1 |
1420 | Via De La Valle | Street Sweeping | 1 | 0 | 1 |
1421 rows × 5 columns
'Pothole'
requests?'University'
neighborhood (near UCSD)?'Downtown'
neighborhood, how many open service requests are there?The common thread between these questions is that they all involve only a subset of the rows in our DataFrame.
'Pothole'
requests? 🕳¶Key concept: Querying.
'Pothole'
requests if we had a DataFrame consisting of only these type of requests.'total'
column in descending order, then extract the neighborhood name in the first row.# This DataFrame only contains rows where the 'service' is 'Pothole'!
only_potholes = requests[requests.get('service') == 'Pothole']
only_potholes
neighborhood | service | closed | open | total | |
---|---|---|---|---|---|
12 | Balboa Park | Pothole | 70 | 8 | 78 |
40 | Barrio Logan | Pothole | 111 | 9 | 120 |
67 | Black Mountain Ranch | Pothole | 19 | 1 | 20 |
91 | Carmel Mountain Ranch | Pothole | 245 | 85 | 330 |
118 | Carmel Valley | Pothole | 111 | 98 | 209 |
... | ... | ... | ... | ... | ... |
1314 | Torrey Hills | Pothole | 11 | 1 | 12 |
1339 | Torrey Pines | Pothole | 280 | 161 | 441 |
1366 | University | Pothole | 712 | 249 | 961 |
1396 | Uptown | Pothole | 513 | 87 | 600 |
1418 | Via De La Valle | Pothole | 11 | 7 | 18 |
56 rows × 5 columns
Now that we have a DataFrame with only 'Pothole'
requests, we can sort by 'total'
in descending order and extract the 'neighborhood'
in the first row, just like we planned:
only_potholes.sort_values('total', ascending=False).get('neighborhood').iloc[0]
'Clairemont Mesa'
🤯 What just happened?
True
or False
.bool
is a data type in Python, just like int
, float
, and str
. True
or False
.5 == 6
False
type(5 == 6)
bool
9 + 10 < 21
True
'zebra' == 'zeb' + 'ra'
True
There are several types of comparisons we can make.
symbol | meaning |
---|---|
== |
equal to |
!= |
not equal to |
< |
less than |
<= |
less than or equal to |
> |
greater than |
>= |
greater than or equal to |
When comparing an entire Series to a value, the result is a Series of bool
s.
requests
neighborhood | service | closed | open | total | |
---|---|---|---|---|---|
0 | Balboa Park | Dead Animal | 11 | 0 | 11 |
1 | Balboa Park | Development Services - Code Enforcement | 2 | 0 | 2 |
2 | Balboa Park | Encampment | 215 | 20 | 235 |
3 | Balboa Park | Environmental Services Code Compliance | 8 | 1 | 9 |
4 | Balboa Park | Graffiti - Code Enforcement | 2 | 1 | 3 |
... | ... | ... | ... | ... | ... |
1416 | Via De La Valle | Encampment | 0 | 1 | 1 |
1417 | Via De La Valle | Pavement Maintenance | 0 | 3 | 3 |
1418 | Via De La Valle | Pothole | 11 | 7 | 18 |
1419 | Via De La Valle | Sidewalk Repair Issue | 0 | 1 | 1 |
1420 | Via De La Valle | Street Sweeping | 1 | 0 | 1 |
1421 rows × 5 columns
requests.get('service') == 'Pothole'
0 False 1 False 2 False 3 False 4 False ... 1416 False 1417 False 1418 True 1419 False 1420 False Name: service, Length: 1421, dtype: bool
To select only certain rows of requests
:
True
s (keep) and False
s (toss), usually by making a comparison.requests[sequence_goes_here]
.requests[requests.get('service') == 'Pothole']
neighborhood | service | closed | open | total | |
---|---|---|---|---|---|
12 | Balboa Park | Pothole | 70 | 8 | 78 |
40 | Barrio Logan | Pothole | 111 | 9 | 120 |
67 | Black Mountain Ranch | Pothole | 19 | 1 | 20 |
91 | Carmel Mountain Ranch | Pothole | 245 | 85 | 330 |
118 | Carmel Valley | Pothole | 111 | 98 | 209 |
... | ... | ... | ... | ... | ... |
1314 | Torrey Hills | Pothole | 11 | 1 | 12 |
1339 | Torrey Pines | Pothole | 280 | 161 | 441 |
1366 | University | Pothole | 712 | 249 | 961 |
1396 | Uptown | Pothole | 513 | 87 | 600 |
1418 | Via De La Valle | Pothole | 11 | 7 | 18 |
56 rows × 5 columns
This time, we'll find the neighborhoods and services with over 100 open requests.
requests
neighborhood | service | closed | open | total | |
---|---|---|---|---|---|
0 | Balboa Park | Dead Animal | 11 | 0 | 11 |
1 | Balboa Park | Development Services - Code Enforcement | 2 | 0 | 2 |
2 | Balboa Park | Encampment | 215 | 20 | 235 |
3 | Balboa Park | Environmental Services Code Compliance | 8 | 1 | 9 |
4 | Balboa Park | Graffiti - Code Enforcement | 2 | 1 | 3 |
... | ... | ... | ... | ... | ... |
1416 | Via De La Valle | Encampment | 0 | 1 | 1 |
1417 | Via De La Valle | Pavement Maintenance | 0 | 3 | 3 |
1418 | Via De La Valle | Pothole | 11 | 7 | 18 |
1419 | Via De La Valle | Sidewalk Repair Issue | 0 | 1 | 1 |
1420 | Via De La Valle | Street Sweeping | 1 | 0 | 1 |
1421 rows × 5 columns
requests.get('open') > 100
0 False 1 False 2 False 3 False 4 False ... 1416 False 1417 False 1418 False 1419 False 1420 False Name: open, Length: 1421, dtype: bool
requests[requests.get('open') > 100]
neighborhood | service | closed | open | total | |
---|---|---|---|---|---|
147 | Clairemont Mesa | Pothole | 1847 | 164 | 2011 |
207 | Downtown | Encampment | 4321 | 649 | 4970 |
210 | Downtown | Graffiti - Public | 1039 | 345 | 1384 |
218 | Downtown | ROW Maintenance | 28 | 120 | 148 |
220 | Downtown | Shared Mobility Device | 1466 | 160 | 1626 |
... | ... | ... | ... | ... | ... |
1366 | University | Pothole | 712 | 249 | 961 |
1386 | Uptown | Encampment | 763 | 121 | 884 |
1389 | Uptown | Graffiti - Public | 1422 | 384 | 1806 |
1391 | Uptown | Missed Collection | 656 | 103 | 759 |
1393 | Uptown | Parking | 608 | 258 | 866 |
38 rows × 5 columns
babypandas
notes, and the babypandas
documentation when working on assignments..groupby
, for organizing rows of a DataFrame with the same value in a particular column.