import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib_inline.backend_inline import set_matplotlib_formats
from IPython.display import display, IFrame
set_matplotlib_formats("svg")
sns.set_context("poster")
sns.set_style("whitegrid")
plt.rcParams["figure.figsize"] = (10, 5)
pd.set_option("display.max_rows", 8)
pd.set_option("display.max_columns", 8)
pd.set_option("display.precision", 2)
📣 Announcements 📣¶
- Project 1 due Wed!
- Lab 3 out, due on Mon
📆 Agenda¶
- Introduce dataset
- Introduce
plotly
- Statistical vs. computational data types
- Data cleaning
- Data quality checks
- Missing data
- Transformations and timestamps
- Modifying structure
- Investigating student-submitted questions
San Diego Food Safety¶
In the last three years, one third of San Diego County restaurants have had at least one major food safety violation.
https://inewsource.org/2023/02/09/san-diego-restaurants-food-safety-violations/
99% Of San Diego Restaurants Earn ‘A' Grades, Bringing Usefulness of System Into Question¶
Food held at unsafe temperatures. Employees not washing their hands. Dirty countertops. Vermin in the kitchen. An expired restaurant permit.
Restaurant inspectors for San Diego County found these violations during a routine health inspection of a diner in La Mesa in November 2016. Despite the violations, the restaurant was awarded a score of 90 out of 100, the lowest possible score to achieve an ‘A’ grade.
The Data¶
https://www.sandiegocounty.gov/content/sdc/deh/fhd/ffis/intro.html.html
- Had to download the data as JSON, then process into dataframes (will cover in future weeks!)
- Downloaded the 1000 restaurants closest to UCSD.
rest = pd.read_csv('data/restaurants.csv')
insp = pd.read_csv('data/inspections.csv')
viol = pd.read_csv('data/violations.csv')
Understanding the Data¶
Aside: Working with files¶
- So far, all data came in CSV files that loaded without problem.
- But many different formats and possible issues in loading in data!
- See Chapter 8 of Learning DS for more.
You Try: Looking at the Data¶
- The articles said that one third of restaurants had at least one major safety violation.
- Which dataframes and columns seem most useful to verify this?
# Fill me in
Using plotly
for Data Visualization¶
I've used plotly
before in class, but let's talk about it now.
- Library for interactive data visualizations
- Install with
conda install plotly
- Discussion this week: why use
conda install
instead ofpip install
?
- Discussion this week: why use
plotly.express
Syntax¶
plotly
is very flexible but can be verbose. We use plotly.express
to make plots quickly.
# Will include this at the top of each notebook from now on.
import plotly.express as px
# DSC 80 preferred styles, but not necessary
import plotly.graph_objects as go
import plotly.io as pio
pio.templates["dsc80"] = go.layout.Template(
layout=dict(
margin=dict(l=30, r=30, t=30, b=30),
autosize=True,
width=600,
height=400,
xaxis=dict(showgrid=True),
yaxis=dict(showgrid=True),
title=dict(x=0.5, xanchor="center"),
)
)
pio.templates.default = "simple_white+dsc80"
fig = px.histogram(insp['score'])
fig