Principles of Data Science
DSC 10, Spring 2024 at UC San Diego
Janine Tiefenbruckshe/her
Lecture(s): MWF 9-9:50AM (A), 10-10:50AM (B), 11-11:50AM (C) Solis 104
The Final Exam is this Saturday, June 8th from 7-10PM in Solis 104 and Solis 107. You will be assigned a seat in one of these rooms.
If at least 75% of the class fills out both SETs and the internal End-of-Quarter Survey, then the entire class will have 1% of extra credit added to their overall grade. The deadline is Saturday, June 8th at 8AM.
Week 1 – Python Basics
- Mon Apr 1
LEC 1 Introduction ✏️
Keywords: data science, course structure, policies, syllabus, Little Women demo
SUR Welcome Survey
- Wed Apr 3
LEC 2 Expressions and Data Types ✏️
Keywords: Jupyter notebooks, expressions, variables, assignment, functions, int, float
- Fri Apr 5
LEC 3 Strings, Lists, and Arrays ✏️
Keywords: string methods, mean, median, lists, arrays, array arithmetic
- Sat Apr 6
Week 2 – DataFrames
- Mon Apr 8
LEC 4 Arrays and DataFrames ✏️
Keywords: array methods, np.arange, .read_csv, .get, .assign, .sort_values, .iloc, .loc, index
- Tue Apr 9
LAB 1 Arrays and DataFrames
- Wed Apr 10
LEC 5 Querying and Grouping ✏️
Keywords: .set_index, Booleans, querying, .shape, &, |, .take, .groupby, aggregation
DISC 2 Arrays and DataFrames
- Fri Apr 12
LEC 6 Grouping and Data Visualization ✏️
Keywords: .groupby, numerical vs. categorical, scatter plot, line plot, bar chart
QUIZ 1 Quiz 1 covers Lectures 1-4 (including Example 4, covered in Lecture 5)
Week 3 – Data Visualization and Functions
- Mon Apr 15
LEC 7 Distributions and Histograms ✏️
Keywords: distributions, density histograms, binning, total area, overlaid plots
- Tue Apr 16
- Wed Apr 17
LEC 8 Functions and Applying ✏️
Keywords: functions, arguments, print vs. return, .apply, .reset_index
- Thu Apr 18
- Fri Apr 19
LEC 9 Grouping on Multiple Columns, Merging ✏️
Keywords: .groupby([col_1, col_2, …]), subgroups, MultiIndex, .merge, number of rows
Week 4 – Control Flow and Probability
- Mon Apr 22
LEC 10 Conditional Statements and Iteration ✏️
Keywords: in, not, and, or, if, else, elif, for-loops, np.append, accumulator pattern
- Tue Apr 23
- Wed Apr 24
LEC 11 Probability • blank, 9am, 10am, 11am
Keywords: event, conditional prob., multiplication and addition rules, independence
- Thu Apr 25
- Fri Apr 26
LEC 12 Simulation ✏️
Keywords: np.random.choice, replacement, np.count_nonzero, coin flipping, Monty Hall
QUIZ 2 Quiz 2 covers Lectures 5-9
Week 5 – Simulation, Sampling, and Confidence Intervals
- Mon Apr 29
LEC 13 Distributions and Sampling ✏️
Keywords: probability vs. empirical distribution, SRS, .sample, parameter, statistic
- Tue Apr 30
HW 3 DataFrames, Control Flow, and Probability
- Wed May 1
LEC 14 Midterm Review • 9am, 10am, 11am
DISC 5 Probability and Simulation
- Fri May 3
EXAM Midterm Exam covers Lectures 1-12
Week 6 – Bootstrapping and the Normal Distribution
- Mon May 6
LEC 15 Bootstrapping and Confidence Intervals ✏️ Watch! 🎥
Keywords: inference, bootstrapping, resample, np.percentile, confidence interval
- Wed May 8
LEC 16 Confidence Intervals, Center, and Spread ✏️
Keywords: interpreting CIs, robust vs. sensitive, center, standard deviation, Chebyshev
DISC 6 Sampling, Bootstrapping, and Confidence Intervals
PROJ Midterm Project
- Thu May 9
- Fri May 10
LEC 17 Standardization and the Normal Distribution ✏️
Keywords: Chebyshev, standard units, normal distribution, CDF, inflection points
Week 7 – Central Limit Theorem
- Mon May 13
LEC 18 The Central Limit Theorem ✏️
Keywords: distribution of the sample mean, square root law, CLT-based CIs
- Tue May 14
- Wed May 15
LEC 19 Choosing Sample Sizes, Statistical Models ✏️
Keywords: standard deviation of 0s and 1s, np.random.multinomial, Robert Swain jury
- Thu May 16
- Fri May 17
LEC 20 Hypothesis Testing ✏️
Keywords: null and alternative hypotheses, test statistic, fair or unfair coin
QUIZ 3 Quiz 3 covers Lectures 13, 15, 16
Week 8 – Hypothesis and Permutation Testing
- Mon May 20
LEC 21 Hypothesis Testing and Total Variation Distance ✏️
Keywords: fair or unfair coin, p-value, midterm exam scores, Alameda County jury, TVD
- Tue May 21
- Wed May 22
LEC 22 TVD, Hypothesis Testing, and Permutation Testing ✏️
Keywords: confidence intervals for hypothesis testing, body temperature, smoking/babies
- Thu May 23
LAB 6 Hypothesis Testing
- Fri May 24
LEC 23 Permutation Testing ✏️
Keywords: smoking/babies, np.random.permutation, shuffling, Deflategate
QUIZ 4 Quiz 4 covers Lectures 17-19
Week 9 – Prediction
- Mon May 27
No Lecture (Memorial Day)
- Tue May 28
- Wed May 29
LEC 24 Correlation ✏️
Keywords: association, correlation coefficient (r), predicting heights, regression line (su)
- Fri May 31
LEC 25 Regression and Least Squares ✏️
Keywords: regression line in original units, outliers, errors, RMSE, best fit, least squares
QUIZ 5 Quiz 5 covers Lectures 20-23
Week 10 – Review
- Mon Jun 3
LEC 26 Residuals and Inference ✏️
Keywords: residuals, residual plots, patterns, datasaurus dozen, prediction intervals
- Tue Jun 4
PROJ Final Project
- Wed Jun 5
LEC 27 Review • 9am, 10am, 11am
DISC 10 Regression
- Thu Jun 6
LAB 7 Regression
- Fri Jun 7
LEC 28 Review, Conclusion ✏️ - Blank - Annotated 9AM, 10AM, 11AM
- Sat Jun 8
EXAM Final Exam (7-10PM) in Solis 104/107
SUR SETs and End-of-Quarter Survey (due 8AM)