Today, we'll use a real dataset and lots of motivating questions to illustrate key DataFrame manipulation techniques.
pandas
¶pandas
.pandas
is the tool for doing data science in Python.pandas
is not so cute...¶babypandas
!¶pandas
called babypandas
.pandas
code.babypandas
🐼¶babypandas
(and pandas
) are called "DataFrames."babypandas
. (We'll need numpy
as well.)import babypandas as bpd
import numpy as np
data/get-it-done-oct-1.csv
contains service requests made on October 1, 2022 through the Get It Done program. We can read in a CSV using bpd.read_csv(...)
. Give it the path to a file relative to your notebook (if the file is in the same folder as your notebook, this is just the name of the file).
oct_1 = bpd.read_csv('data/get-it-done-oct-1.csv')
oct_1
service_request_id | date_requested | neighborhood | service | status | street_address | public_description | |
---|---|---|---|---|---|---|---|
0 | 3940112 | 2022-10-01T00:11:00 | La Jolla | Pothole | Open | 7556 VIA CAPRI, San Diego, CA 92037, USA | Pothole |
1 | 3940113 | 2022-10-01T00:12:00 | La Jolla | Pothole | Open | 7566 VIA CAPRI, San Diego, CA 92037, USA | Potholes / fix the damn road |
2 | 3940114 | 2022-10-01T00:13:00 | Pacific Beach | Street Light Maintenance | Open | 1698-1500 Monmouth Dr, San Diego, CA 92109, USA | Street light out on the corner of Monmouth Dr ... |
3 | 3940115 | 2022-10-01T00:13:00 | La Jolla | Pothole | Open | 7456 VIA CAPRI, San Diego, CA 92037, USA | Pothole |
4 | 3940116 | 2022-10-01T00:16:00 | Scripps Miramar Ranch | Traffic Signal Timing | Open | 10895 Hibert St, San Diego, CA 92131, USA | Is it possible to time this light sequentially... |
... | ... | ... | ... | ... | ... | ... | ... |
748 | 3940819 | 2022-10-01T17:39:00 | North Park | Other | Closed | 3935 32nd St | Bike Theft Chop Shop Behind Starbucks on 32nd ... |
749 | 3940876 | 2022-10-01T19:47:00 | Skyline-Paradise Hills | Parking | Closed | 7701-7899 Bloomfield Rd, San Diego, CA 92114, USA | Car has been parked there for many months. Exp... |
750 | 3940900 | 2022-10-01T21:06:00 | Downtown | Sidewalk Repair Issue | Closed | Petco Park | Safety hazard illegal vending |
751 | 3940909 | 2022-10-01T21:44:00 | Clairemont Mesa | Other | Closed | 2810 Denver Street | Underage drinking party |
752 | 3940924 | 2022-10-01T22:54:00 | Clairemont Mesa | Other | Closed | 5378 Jamestown Rd | People doing drugs in their car AGAIN. Please ... |
753 rows × 7 columns
type
.'neighborhood'
and 'status'
.# This DataFrame has 753 rows and 7 columns
oct_1
service_request_id | date_requested | neighborhood | service | status | street_address | public_description | |
---|---|---|---|---|---|---|---|
0 | 3940112 | 2022-10-01T00:11:00 | La Jolla | Pothole | Open | 7556 VIA CAPRI, San Diego, CA 92037, USA | Pothole |
1 | 3940113 | 2022-10-01T00:12:00 | La Jolla | Pothole | Open | 7566 VIA CAPRI, San Diego, CA 92037, USA | Potholes / fix the damn road |
2 | 3940114 | 2022-10-01T00:13:00 | Pacific Beach | Street Light Maintenance | Open | 1698-1500 Monmouth Dr, San Diego, CA 92109, USA | Street light out on the corner of Monmouth Dr ... |
3 | 3940115 | 2022-10-01T00:13:00 | La Jolla | Pothole | Open | 7456 VIA CAPRI, San Diego, CA 92037, USA | Pothole |
4 | 3940116 | 2022-10-01T00:16:00 | Scripps Miramar Ranch | Traffic Signal Timing | Open | 10895 Hibert St, San Diego, CA 92131, USA | Is it possible to time this light sequentially... |
... | ... | ... | ... | ... | ... | ... | ... |
748 | 3940819 | 2022-10-01T17:39:00 | North Park | Other | Closed | 3935 32nd St | Bike Theft Chop Shop Behind Starbucks on 32nd ... |
749 | 3940876 | 2022-10-01T19:47:00 | Skyline-Paradise Hills | Parking | Closed | 7701-7899 Bloomfield Rd, San Diego, CA 92114, USA | Car has been parked there for many months. Exp... |
750 | 3940900 | 2022-10-01T21:06:00 | Downtown | Sidewalk Repair Issue | Closed | Petco Park | Safety hazard illegal vending |
751 | 3940909 | 2022-10-01T21:44:00 | Clairemont Mesa | Other | Closed | 2810 Denver Street | Underage drinking party |
752 | 3940924 | 2022-10-01T22:54:00 | Clairemont Mesa | Other | Closed | 5378 Jamestown Rd | People doing drugs in their car AGAIN. Please ... |
753 rows × 7 columns
.set_index(column_name)
..set_index
returns a new DataFrame; it does not modify the original DataFrame.oct_1.set_index('service_request_id')
date_requested | neighborhood | service | status | street_address | public_description | |
---|---|---|---|---|---|---|
service_request_id | ||||||
3940112 | 2022-10-01T00:11:00 | La Jolla | Pothole | Open | 7556 VIA CAPRI, San Diego, CA 92037, USA | Pothole |
3940113 | 2022-10-01T00:12:00 | La Jolla | Pothole | Open | 7566 VIA CAPRI, San Diego, CA 92037, USA | Potholes / fix the damn road |
3940114 | 2022-10-01T00:13:00 | Pacific Beach | Street Light Maintenance | Open | 1698-1500 Monmouth Dr, San Diego, CA 92109, USA | Street light out on the corner of Monmouth Dr ... |
3940115 | 2022-10-01T00:13:00 | La Jolla | Pothole | Open | 7456 VIA CAPRI, San Diego, CA 92037, USA | Pothole |
3940116 | 2022-10-01T00:16:00 | Scripps Miramar Ranch | Traffic Signal Timing | Open | 10895 Hibert St, San Diego, CA 92131, USA | Is it possible to time this light sequentially... |
... | ... | ... | ... | ... | ... | ... |
3940819 | 2022-10-01T17:39:00 | North Park | Other | Closed | 3935 32nd St | Bike Theft Chop Shop Behind Starbucks on 32nd ... |
3940876 | 2022-10-01T19:47:00 | Skyline-Paradise Hills | Parking | Closed | 7701-7899 Bloomfield Rd, San Diego, CA 92114, USA | Car has been parked there for many months. Exp... |
3940900 | 2022-10-01T21:06:00 | Downtown | Sidewalk Repair Issue | Closed | Petco Park | Safety hazard illegal vending |
3940909 | 2022-10-01T21:44:00 | Clairemont Mesa | Other | Closed | 2810 Denver Street | Underage drinking party |
3940924 | 2022-10-01T22:54:00 | Clairemont Mesa | Other | Closed | 5378 Jamestown Rd | People doing drugs in their car AGAIN. Please ... |
753 rows × 6 columns
oct_1
service_request_id | date_requested | neighborhood | service | status | street_address | public_description | |
---|---|---|---|---|---|---|---|
0 | 3940112 | 2022-10-01T00:11:00 | La Jolla | Pothole | Open | 7556 VIA CAPRI, San Diego, CA 92037, USA | Pothole |
1 | 3940113 | 2022-10-01T00:12:00 | La Jolla | Pothole | Open | 7566 VIA CAPRI, San Diego, CA 92037, USA | Potholes / fix the damn road |
2 | 3940114 | 2022-10-01T00:13:00 | Pacific Beach | Street Light Maintenance | Open | 1698-1500 Monmouth Dr, San Diego, CA 92109, USA | Street light out on the corner of Monmouth Dr ... |
3 | 3940115 | 2022-10-01T00:13:00 | La Jolla | Pothole | Open | 7456 VIA CAPRI, San Diego, CA 92037, USA | Pothole |
4 | 3940116 | 2022-10-01T00:16:00 | Scripps Miramar Ranch | Traffic Signal Timing | Open | 10895 Hibert St, San Diego, CA 92131, USA | Is it possible to time this light sequentially... |
... | ... | ... | ... | ... | ... | ... | ... |
748 | 3940819 | 2022-10-01T17:39:00 | North Park | Other | Closed | 3935 32nd St | Bike Theft Chop Shop Behind Starbucks on 32nd ... |
749 | 3940876 | 2022-10-01T19:47:00 | Skyline-Paradise Hills | Parking | Closed | 7701-7899 Bloomfield Rd, San Diego, CA 92114, USA | Car has been parked there for many months. Exp... |
750 | 3940900 | 2022-10-01T21:06:00 | Downtown | Sidewalk Repair Issue | Closed | Petco Park | Safety hazard illegal vending |
751 | 3940909 | 2022-10-01T21:44:00 | Clairemont Mesa | Other | Closed | 2810 Denver Street | Underage drinking party |
752 | 3940924 | 2022-10-01T22:54:00 | Clairemont Mesa | Other | Closed | 5378 Jamestown Rd | People doing drugs in their car AGAIN. Please ... |
753 rows × 7 columns
oct_1 = oct_1.set_index('service_request_id')
oct_1
date_requested | neighborhood | service | status | street_address | public_description | |
---|---|---|---|---|---|---|
service_request_id | ||||||
3940112 | 2022-10-01T00:11:00 | La Jolla | Pothole | Open | 7556 VIA CAPRI, San Diego, CA 92037, USA | Pothole |
3940113 | 2022-10-01T00:12:00 | La Jolla | Pothole | Open | 7566 VIA CAPRI, San Diego, CA 92037, USA | Potholes / fix the damn road |
3940114 | 2022-10-01T00:13:00 | Pacific Beach | Street Light Maintenance | Open | 1698-1500 Monmouth Dr, San Diego, CA 92109, USA | Street light out on the corner of Monmouth Dr ... |
3940115 | 2022-10-01T00:13:00 | La Jolla | Pothole | Open | 7456 VIA CAPRI, San Diego, CA 92037, USA | Pothole |
3940116 | 2022-10-01T00:16:00 | Scripps Miramar Ranch | Traffic Signal Timing | Open | 10895 Hibert St, San Diego, CA 92131, USA | Is it possible to time this light sequentially... |
... | ... | ... | ... | ... | ... | ... |
3940819 | 2022-10-01T17:39:00 | North Park | Other | Closed | 3935 32nd St | Bike Theft Chop Shop Behind Starbucks on 32nd ... |
3940876 | 2022-10-01T19:47:00 | Skyline-Paradise Hills | Parking | Closed | 7701-7899 Bloomfield Rd, San Diego, CA 92114, USA | Car has been parked there for many months. Exp... |
3940900 | 2022-10-01T21:06:00 | Downtown | Sidewalk Repair Issue | Closed | Petco Park | Safety hazard illegal vending |
3940909 | 2022-10-01T21:44:00 | Clairemont Mesa | Other | Closed | 2810 Denver Street | Underage drinking party |
3940924 | 2022-10-01T22:54:00 | Clairemont Mesa | Other | Closed | 5378 Jamestown Rd | People doing drugs in their car AGAIN. Please ... |
753 rows × 6 columns
.shape
returns the number of rows and columns in a given DataFrame.[]
: .shape[0]
for rows..shape[1]
for columns.# There were 7 columns before, but one of them became the index, and the index is not a column!
oct_1.shape
(753, 6)
# Number of rows
oct_1.shape[0]
753
# Number of columns
oct_1.shape[1]
6
data/get-it-done-requests.csv
contains a summary of all Get It Done requests submitted this calendar year (2022) so far. 'closed'
) versus unresolved ('open'
).requests = bpd.read_csv('data/get-it-done-requests.csv')
requests
neighborhood | service | closed | open | |
---|---|---|---|---|
0 | Balboa Park | Dead Animal | 46 | 0 |
1 | Balboa Park | Development Services - Code Enforcement | 2 | 0 |
2 | Balboa Park | Encampment | 1484 | 219 |
3 | Balboa Park | Environmental Services Code Compliance | 25 | 1 |
4 | Balboa Park | Graffiti | 977 | 0 |
... | ... | ... | ... | ... |
1582 | Via De La Valle | Parking | 1 | 0 |
1583 | Via De La Valle | Pavement Maintenance | 0 | 1 |
1584 | Via De La Valle | Pothole | 9 | 1 |
1585 | Via De La Valle | Stormwater Code Enforcement | 3 | 0 |
1586 | Via De La Valle | Street Light Maintenance | 1 | 0 |
1587 rows × 4 columns
Key concepts: Accessing columns, performing operations with them, and adding new columns.
.get(column_name)
.requests
neighborhood | service | closed | open | |
---|---|---|---|---|
0 | Balboa Park | Dead Animal | 46 | 0 |
1 | Balboa Park | Development Services - Code Enforcement | 2 | 0 |
2 | Balboa Park | Encampment | 1484 | 219 |
3 | Balboa Park | Environmental Services Code Compliance | 25 | 1 |
4 | Balboa Park | Graffiti | 977 | 0 |
... | ... | ... | ... | ... |
1582 | Via De La Valle | Parking | 1 | 0 |
1583 | Via De La Valle | Pavement Maintenance | 0 | 1 |
1584 | Via De La Valle | Pothole | 9 | 1 |
1585 | Via De La Valle | Stormwater Code Enforcement | 3 | 0 |
1586 | Via De La Valle | Street Light Maintenance | 1 | 0 |
1587 rows × 4 columns
requests.get('closed')
0 46 1 2 2 1484 3 25 4 977 ... 1582 1 1583 0 1584 9 1585 3 1586 1 Name: closed, Length: 1587, dtype: int64
requests.get('closed')
0 46 1 2 2 1484 3 25 4 977 ... 1582 1 1583 0 1584 9 1585 3 1586 1 Name: closed, Length: 1587, dtype: int64
requests.get('open')
0 0 1 0 2 219 3 1 4 0 ... 1582 0 1583 1 1584 1 1585 0 1586 0 Name: open, Length: 1587, dtype: int64
requests.get('closed') + requests.get('open')
0 46 1 2 2 1703 3 26 4 977 ... 1582 1 1583 1 1584 10 1585 3 1586 1 Length: 1587, dtype: int64
.assign(name_of_column=data_in_series)
to assign a Series (or array, or list) to a DataFrame.name_of_column
.requests.assign(
total=requests.get('closed') + requests.get('open')
)
neighborhood | service | closed | open | total | |
---|---|---|---|---|---|
0 | Balboa Park | Dead Animal | 46 | 0 | 46 |
1 | Balboa Park | Development Services - Code Enforcement | 2 | 0 | 2 |
2 | Balboa Park | Encampment | 1484 | 219 | 1703 |
3 | Balboa Park | Environmental Services Code Compliance | 25 | 1 | 26 |
4 | Balboa Park | Graffiti | 977 | 0 | 977 |
... | ... | ... | ... | ... | ... |
1582 | Via De La Valle | Parking | 1 | 0 | 1 |
1583 | Via De La Valle | Pavement Maintenance | 0 | 1 | 1 |
1584 | Via De La Valle | Pothole | 9 | 1 | 10 |
1585 | Via De La Valle | Stormwater Code Enforcement | 3 | 0 | 3 |
1586 | Via De La Valle | Street Light Maintenance | 1 | 0 | 1 |
1587 rows × 5 columns
requests
neighborhood | service | closed | open | |
---|---|---|---|---|
0 | Balboa Park | Dead Animal | 46 | 0 |
1 | Balboa Park | Development Services - Code Enforcement | 2 | 0 |
2 | Balboa Park | Encampment | 1484 | 219 |
3 | Balboa Park | Environmental Services Code Compliance | 25 | 1 |
4 | Balboa Park | Graffiti | 977 | 0 |
... | ... | ... | ... | ... |
1582 | Via De La Valle | Parking | 1 | 0 |
1583 | Via De La Valle | Pavement Maintenance | 0 | 1 |
1584 | Via De La Valle | Pothole | 9 | 1 |
1585 | Via De La Valle | Stormwater Code Enforcement | 3 | 0 |
1586 | Via De La Valle | Street Light Maintenance | 1 | 0 |
1587 rows × 4 columns
requests = requests.assign(
total=requests.get('closed') + requests.get('open')
)
requests
neighborhood | service | closed | open | total | |
---|---|---|---|---|---|
0 | Balboa Park | Dead Animal | 46 | 0 | 46 |
1 | Balboa Park | Development Services - Code Enforcement | 2 | 0 | 2 |
2 | Balboa Park | Encampment | 1484 | 219 | 1703 |
3 | Balboa Park | Environmental Services Code Compliance | 25 | 1 | 26 |
4 | Balboa Park | Graffiti | 977 | 0 | 977 |
... | ... | ... | ... | ... | ... |
1582 | Via De La Valle | Parking | 1 | 0 | 1 |
1583 | Via De La Valle | Pavement Maintenance | 0 | 1 | 1 |
1584 | Via De La Valle | Pothole | 9 | 1 | 10 |
1585 | Via De La Valle | Stormwater Code Enforcement | 3 | 0 | 3 |
1586 | Via De La Valle | Street Light Maintenance | 1 | 0 | 1 |
1587 rows × 5 columns
Key concept: Computing statistics of columns using Series methods.
Series, like arrays, have helpful methods, including .min()
, .max()
, and .mean()
.
requests.get('total').max()
11342
What is it that people are reporting so frequently, and where? We'll see how to find out shortly!
Other statistics:
requests.get('total').mean()
171.33333333333334
requests.get('total').median()
41.0
requests.get('open').mean()
25.220541902961564
requests.get('open').median()
4.0
Key concepts: Sorting. Accessing using integer positions.
.sort_values(by=column_name)
method to sort.by=
is not necessary.requests.sort_values(by='total')
neighborhood | service | closed | open | total | |
---|---|---|---|---|---|
1586 | Via De La Valle | Street Light Maintenance | 1 | 0 | 1 |
229 | Del Mar Mesa | Weed Cleanup | 1 | 0 | 1 |
1429 | Tijuana River Valley | Street Sweeping | 1 | 0 | 1 |
1428 | Tijuana River Valley | Street Light Maintenance | 0 | 1 | 1 |
1426 | Tijuana River Valley | Sidewalk Repair Issue | 1 | 0 | 1 |
... | ... | ... | ... | ... | ... |
840 | North Park | Parking | 3059 | 224 | 3283 |
1363 | Southeastern San Diego | Parking | 2780 | 614 | 3394 |
1360 | Southeastern San Diego | Illegal Dumping | 5350 | 228 | 5578 |
452 | Mid-City:City Heights | Illegal Dumping | 9021 | 222 | 9243 |
232 | Downtown | Encampment | 9262 | 2080 | 11342 |
1587 rows × 5 columns
This sorts, but in ascending order (small to large). We want the opposite!
.sort_values(by=column_name, ascending=False)
to sort in descending order.ascending
is an optional argument. If omitted, it will be set to True
by default.ordered_requests = requests.sort_values(by='total', ascending=False)
ordered_requests
neighborhood | service | closed | open | total | |
---|---|---|---|---|---|
232 | Downtown | Encampment | 9262 | 2080 | 11342 |
452 | Mid-City:City Heights | Illegal Dumping | 9021 | 222 | 9243 |
1360 | Southeastern San Diego | Illegal Dumping | 5350 | 228 | 5578 |
1363 | Southeastern San Diego | Parking | 2780 | 614 | 3394 |
840 | North Park | Parking | 3059 | 224 | 3283 |
... | ... | ... | ... | ... | ... |
1428 | Tijuana River Valley | Street Light Maintenance | 0 | 1 | 1 |
1426 | Tijuana River Valley | Sidewalk Repair Issue | 1 | 0 | 1 |
1423 | Tijuana River Valley | Pavement Maintenance | 1 | 0 | 1 |
1422 | Tijuana River Valley | Parks Issue | 1 | 0 | 1 |
1586 | Via De La Valle | Street Light Maintenance | 1 | 0 | 1 |
1587 rows × 5 columns
'Encampment'
in 'Downtown'
, but how do we extract that information using code?.iloc[integer_position]
.ordered_requests
neighborhood | service | closed | open | total | |
---|---|---|---|---|---|
232 | Downtown | Encampment | 9262 | 2080 | 11342 |
452 | Mid-City:City Heights | Illegal Dumping | 9021 | 222 | 9243 |
1360 | Southeastern San Diego | Illegal Dumping | 5350 | 228 | 5578 |
1363 | Southeastern San Diego | Parking | 2780 | 614 | 3394 |
840 | North Park | Parking | 3059 | 224 | 3283 |
... | ... | ... | ... | ... | ... |
1428 | Tijuana River Valley | Street Light Maintenance | 0 | 1 | 1 |
1426 | Tijuana River Valley | Sidewalk Repair Issue | 1 | 0 | 1 |
1423 | Tijuana River Valley | Pavement Maintenance | 1 | 0 | 1 |
1422 | Tijuana River Valley | Parks Issue | 1 | 0 | 1 |
1586 | Via De La Valle | Street Light Maintenance | 1 | 0 | 1 |
1587 rows × 5 columns
ordered_requests.get('neighborhood')
232 Downtown 452 Mid-City:City Heights 1360 Southeastern San Diego 1363 Southeastern San Diego 840 North Park ... 1428 Tijuana River Valley 1426 Tijuana River Valley 1423 Tijuana River Valley 1422 Tijuana River Valley 1586 Via De La Valle Name: neighborhood, Length: 1587, dtype: object
ordered_requests.get('neighborhood').iloc[0]
'Downtown'
ordered_requests.get('service').iloc[0]
'Encampment'
Key concept: Accessing using row labels.
On October 1, you submitted service request 3940652. Has the issue been resolved?
This cannot be answered from the annual summary data, but must be answered from the detailed data about October 1.
oct_1
date_requested | neighborhood | service | status | street_address | public_description | |
---|---|---|---|---|---|---|
service_request_id | ||||||
3940112 | 2022-10-01T00:11:00 | La Jolla | Pothole | Open | 7556 VIA CAPRI, San Diego, CA 92037, USA | Pothole |
3940113 | 2022-10-01T00:12:00 | La Jolla | Pothole | Open | 7566 VIA CAPRI, San Diego, CA 92037, USA | Potholes / fix the damn road |
3940114 | 2022-10-01T00:13:00 | Pacific Beach | Street Light Maintenance | Open | 1698-1500 Monmouth Dr, San Diego, CA 92109, USA | Street light out on the corner of Monmouth Dr ... |
3940115 | 2022-10-01T00:13:00 | La Jolla | Pothole | Open | 7456 VIA CAPRI, San Diego, CA 92037, USA | Pothole |
3940116 | 2022-10-01T00:16:00 | Scripps Miramar Ranch | Traffic Signal Timing | Open | 10895 Hibert St, San Diego, CA 92131, USA | Is it possible to time this light sequentially... |
... | ... | ... | ... | ... | ... | ... |
3940819 | 2022-10-01T17:39:00 | North Park | Other | Closed | 3935 32nd St | Bike Theft Chop Shop Behind Starbucks on 32nd ... |
3940876 | 2022-10-01T19:47:00 | Skyline-Paradise Hills | Parking | Closed | 7701-7899 Bloomfield Rd, San Diego, CA 92114, USA | Car has been parked there for many months. Exp... |
3940900 | 2022-10-01T21:06:00 | Downtown | Sidewalk Repair Issue | Closed | Petco Park | Safety hazard illegal vending |
3940909 | 2022-10-01T21:44:00 | Clairemont Mesa | Other | Closed | 2810 Denver Street | Underage drinking party |
3940924 | 2022-10-01T22:54:00 | Clairemont Mesa | Other | Closed | 5378 Jamestown Rd | People doing drugs in their car AGAIN. Please ... |
753 rows × 6 columns
Your service request is buried in the middle of the DataFrame. Only the first few rows and last few rows are shown, so you can't tell just by looking at the DataFrame.
To pull out one particular entry of a DataFrame corresponding to a row and column with certain labels:
.get(column_name)
to extract the entire column as a Series..loc[]
to access the element of a Series with a particular row label.In this class, we'll always get a column, then a row (but row, then column is also possible).
oct_1.get('status')
service_request_id 3940112 Open 3940113 Open 3940114 Open 3940115 Open 3940116 Open ... 3940819 Closed 3940876 Closed 3940900 Closed 3940909 Closed 3940924 Closed Name: status, Length: 753, dtype: object
oct_1.get('status').loc[3940652]
'Open'
Oh no, your service request 3940652 has still not been resolved! What was the problem again?
Write one line of code that evaluates to the full description of the problem, as you described it in your service request.
...
Ellipsis
.loc[]
uses the row label..iloc[]
uses the integer position..loc[]
is more convenient, but each is best for different scenarios.bpd.read_csv
.bpd.read_csv('data/get-it-done-oct-1.csv')
service_request_id | date_requested | neighborhood | service | status | street_address | public_description | |
---|---|---|---|---|---|---|---|
0 | 3940112 | 2022-10-01T00:11:00 | La Jolla | Pothole | Open | 7556 VIA CAPRI, San Diego, CA 92037, USA | Pothole |
1 | 3940113 | 2022-10-01T00:12:00 | La Jolla | Pothole | Open | 7566 VIA CAPRI, San Diego, CA 92037, USA | Potholes / fix the damn road |
2 | 3940114 | 2022-10-01T00:13:00 | Pacific Beach | Street Light Maintenance | Open | 1698-1500 Monmouth Dr, San Diego, CA 92109, USA | Street light out on the corner of Monmouth Dr ... |
3 | 3940115 | 2022-10-01T00:13:00 | La Jolla | Pothole | Open | 7456 VIA CAPRI, San Diego, CA 92037, USA | Pothole |
4 | 3940116 | 2022-10-01T00:16:00 | Scripps Miramar Ranch | Traffic Signal Timing | Open | 10895 Hibert St, San Diego, CA 92131, USA | Is it possible to time this light sequentially... |
... | ... | ... | ... | ... | ... | ... | ... |
748 | 3940819 | 2022-10-01T17:39:00 | North Park | Other | Closed | 3935 32nd St | Bike Theft Chop Shop Behind Starbucks on 32nd ... |
749 | 3940876 | 2022-10-01T19:47:00 | Skyline-Paradise Hills | Parking | Closed | 7701-7899 Bloomfield Rd, San Diego, CA 92114, USA | Car has been parked there for many months. Exp... |
750 | 3940900 | 2022-10-01T21:06:00 | Downtown | Sidewalk Repair Issue | Closed | Petco Park | Safety hazard illegal vending |
751 | 3940909 | 2022-10-01T21:44:00 | Clairemont Mesa | Other | Closed | 2810 Denver Street | Underage drinking party |
752 | 3940924 | 2022-10-01T22:54:00 | Clairemont Mesa | Other | Closed | 5378 Jamestown Rd | People doing drugs in their car AGAIN. Please ... |
753 rows × 7 columns
bpd.read_csv('data/get-it-done-oct-1.csv').get('public_description').loc[31]
'Limes blocking the sidewalk'
bpd.read_csv('data/get-it-done-oct-1.csv').get('public_description').iloc[31]
'Limes blocking the sidewalk'
requests.get('open').max()
.oct_1.shape[0]
.oct_1.sort_values(by='date_requested', ascending=False).get('public_description').iloc[0]
.Moving forward, let's just focus on the requests
DataFrame. As a reminder, here's what it looks like:
requests
neighborhood | service | closed | open | total | |
---|---|---|---|---|---|
0 | Balboa Park | Dead Animal | 46 | 0 | 46 |
1 | Balboa Park | Development Services - Code Enforcement | 2 | 0 | 2 |
2 | Balboa Park | Encampment | 1484 | 219 | 1703 |
3 | Balboa Park | Environmental Services Code Compliance | 25 | 1 | 26 |
4 | Balboa Park | Graffiti | 977 | 0 | 977 |
... | ... | ... | ... | ... | ... |
1582 | Via De La Valle | Parking | 1 | 0 | 1 |
1583 | Via De La Valle | Pavement Maintenance | 0 | 1 | 1 |
1584 | Via De La Valle | Pothole | 9 | 1 | 10 |
1585 | Via De La Valle | Stormwater Code Enforcement | 3 | 0 | 3 |
1586 | Via De La Valle | Street Light Maintenance | 1 | 0 | 1 |
1587 rows × 5 columns
'Weed Cleanup'
requests?'University'
neighborhood (near UCSD)?'Downtown'
neighborhood, how many open service requests are there?The common thread between these questions is that they all involve only a subset of the rows in our DataFrame.
'Weed Cleanup'
requests?¶Key concept: Selecting rows (via Boolean indexing).
'Weed Cleanup'
requests if we had a DataFrame consisting of only these type of requests.'total'
column in descending order, then extract the neighborhood name in the first row.requests[requests.get('service') == 'Weed Cleanup']
neighborhood | service | closed | open | total | |
---|---|---|---|---|---|
30 | Balboa Park | Weed Cleanup | 23 | 0 | 23 |
61 | Barrio Logan | Weed Cleanup | 10 | 1 | 11 |
87 | Black Mountain Ranch | Weed Cleanup | 0 | 1 | 1 |
116 | Carmel Mountain Ranch | Weed Cleanup | 2 | 0 | 2 |
146 | Carmel Valley | Weed Cleanup | 6 | 1 | 7 |
... | ... | ... | ... | ... | ... |
1433 | Tijuana River Valley | Weed Cleanup | 2 | 0 | 2 |
1489 | Torrey Hills | Weed Cleanup | 1 | 0 | 1 |
1518 | Torrey Pines | Weed Cleanup | 10 | 7 | 17 |
1549 | University | Weed Cleanup | 53 | 10 | 63 |
1580 | Uptown | Weed Cleanup | 36 | 8 | 44 |
53 rows × 5 columns
Use ==
to check equality. Not =
, as that's for assignment of a value to a variable.
'Weed Cleanup' == 'Weed Clean-Up'
False
'Weed Cleanup' == 'Weed Cleanup'
True
We can broadcast the equality check to each element of a Series. The comparison happens element-wise.
requests.get('service') == 'Weed Cleanup'
0 False 1 False 2 False 3 False 4 False ... 1582 False 1583 False 1584 False 1585 False 1586 False Name: service, Length: 1587, dtype: bool
To select only some rows of requests
:
Make a sequence (list/array/Series) of True
s (keep) and False
s (toss).
True
and False
are of the Boolean data type.Then pass it into requests[sequence_goes_here]
.
Rather than making the sequence by hand, we usually generate it by making a comparison.
Comparisons can check inequality as well as equality.
requests
neighborhood | service | closed | open | total | |
---|---|---|---|---|---|
0 | Balboa Park | Dead Animal | 46 | 0 | 46 |
1 | Balboa Park | Development Services - Code Enforcement | 2 | 0 | 2 |
2 | Balboa Park | Encampment | 1484 | 219 | 1703 |
3 | Balboa Park | Environmental Services Code Compliance | 25 | 1 | 26 |
4 | Balboa Park | Graffiti | 977 | 0 | 977 |
... | ... | ... | ... | ... | ... |
1582 | Via De La Valle | Parking | 1 | 0 | 1 |
1583 | Via De La Valle | Pavement Maintenance | 0 | 1 | 1 |
1584 | Via De La Valle | Pothole | 9 | 1 | 10 |
1585 | Via De La Valle | Stormwater Code Enforcement | 3 | 0 | 3 |
1586 | Via De La Valle | Street Light Maintenance | 1 | 0 | 1 |
1587 rows × 5 columns
requests.get('open') > 1
0 False 1 False 2 True 3 False 4 False ... 1582 False 1583 False 1584 False 1585 False 1586 False Name: open, Length: 1587, dtype: bool
requests[requests.get('open') > 1]
neighborhood | service | closed | open | total | |
---|---|---|---|---|---|
2 | Balboa Park | Encampment | 1484 | 219 | 1703 |
5 | Balboa Park | Graffiti - Code Enforcement | 12 | 2 | 14 |
6 | Balboa Park | Graffiti - Public | 137 | 41 | 178 |
7 | Balboa Park | Illegal Dumping | 255 | 29 | 284 |
10 | Balboa Park | Parking | 87 | 4 | 91 |
... | ... | ... | ... | ... | ... |
1577 | Uptown | Trash/Recycling Collection | 61 | 3 | 64 |
1578 | Uptown | Tree Maintenance | 271 | 83 | 354 |
1579 | Uptown | Waste on Private Property | 102 | 12 | 114 |
1580 | Uptown | Weed Cleanup | 36 | 8 | 44 |
1581 | Via De La Valle | Encampment | 1 | 2 | 3 |
1001 rows × 5 columns
'Weed Cleanup'
requests?¶Strategy:
'Weed Cleanup'
requests.'total'
in descending order.'neighborhood'
column.weed_cleanup_only = requests[requests.get('service') == 'Weed Cleanup']
weed_cleanup_sorted = weed_cleanup_only.sort_values(by='total', ascending=False)
weed_cleanup_sorted
neighborhood | service | closed | open | total | |
---|---|---|---|---|---|
1383 | Southeastern San Diego | Weed Cleanup | 72 | 7 | 79 |
807 | Navajo | Weed Cleanup | 66 | 1 | 67 |
177 | Clairemont Mesa | Weed Cleanup | 55 | 11 | 66 |
1549 | University | Weed Cleanup | 53 | 10 | 63 |
1352 | Skyline-Paradise Hills | Weed Cleanup | 52 | 8 | 60 |
... | ... | ... | ... | ... | ... |
268 | East Elliott | Weed Cleanup | 1 | 0 | 1 |
309 | Fairbanks Ranch Country Club | Weed Cleanup | 1 | 0 | 1 |
1489 | Torrey Hills | Weed Cleanup | 1 | 0 | 1 |
87 | Black Mountain Ranch | Weed Cleanup | 0 | 1 | 1 |
746 | Mission Beach | Weed Cleanup | 1 | 0 | 1 |
53 rows × 5 columns
weed_cleanup_sorted.get('neighborhood').iloc[0]
'Southeastern San Diego'
requests[requests.get('service') == 'Lime Cleanup']
neighborhood | service | closed | open | total |
---|
Which expression below evaluates to the total number of service requests in the 'Downtown'
neighborhood?
A. requests[requests.get('neighborhood') == 'Downtown'].get('total').sum()
B. requests.get('total').sum()[requests.get('neighborhood') == 'Downtown']
C. requests['Downtown'].get('total').sum()
D. More than one of the above.
...
Ellipsis
Question: What is the most commonly requested service in the 'University'
neighborhood (near UCSD)?
Write one line of code that evaluates to the answer.
...
Ellipsis
babypandas
notes, and the babypandas
documentation when working on assignments.We'll answer more complicated questions, which will lead us to a new core DataFrame method, .groupby
.