Skip to main content Link Search Menu Expand Document (external link)

Principles of Data Science

DSC 10, Summer 2024 at UC San Diego

Nishant Kheterpal
he/him/his

nkheterpal@ucsd.edu

Lecture: TuTh 11AM-12:50PM Mosaic 0204, W 11AM-12:50PM Mandeville Hall B-104

The Final Exam is this Saturday, August 3rd from 11:30-2:30PM in Mosaic 0204.

Our final review session will be held Friday, Aug 2nd, 12-2PM in HDSI 123 and at this Zoom link.

If at least 75% of the class fills out both SETs and the internal End-of-Quarter Survey, then the entire class will have 1% of extra credit added to their overall grade. The deadline is Saturday, August 3rd at 8AM.

Jump to the current week

Week 1 – Python Basics and DataFrames

Tue Jul 2

LEC 1 Introduction      

CIT 1.0-1.3

Keywords: data science, course structure, policies, syllabus, Little Women demo

LEC 2 Expressions and Data Types   

BPD 1-6

Keywords: Jupyter notebooks, expressions, variables, assignment, functions, int, float

DISC 1 Getting Started with Jupyter Notebooks  

SUR Welcome Survey

Wed Jul 3

LEC 3 Strings, Lists, and Arrays      

BPD 7-8, CIT 14.1

Keywords: string methods, mean, median, lists, arrays, array arithmetic

LEC 4 Arrays and DataFrames      

BPD 9

Keywords: array methods, np.arange, .read_csv, .get, .assign, .sort_values, .iloc, .loc, index

LAB 0 Expressions and Data Types

Thu Jul 4

No Lecture (Independence Day)

Week 2 – DataFrames and Data Visualization

Mon Jul 8

LAB 1 Arrays and DataFrames

Tue Jul 9

LEC 5 Querying and Grouping      

BPD 10-11

Keywords: .set_index, Booleans, querying, .shape, &, |, .take, .groupby, aggregation

LEC 6 Grouping and Data Visualization      

CIT 7.0-7.1

Keywords: .groupby, numerical vs. categorical, scatter plot, line plot, bar chart

DISC 2 Arrays and DataFrames  

Wed Jul 10

LEC 7 Distributions and Histograms      

CIT 7.2-7.3

Keywords: distributions, density histograms, binning, total area, overlaid plots

LEC 8 Functions and Applying      

BPD 6, BPD 12

Keywords: functions, arguments, print vs. return, .apply, .reset_index

QUIZ 1 Quiz 1 covers Lectures 1-4 (including Example 4 from Lecture 5)

HW 1 Basic Python, Arrays, and DataFrames

Thu Jul 11

LEC 9 Grouping on Multiple Columns, Merging      

BPD 11, BPD 13

Keywords: .groupby([col_1, col_2, …]), subgroups, MultiIndex, .merge, number of rows

LEC 10 Conditional Statements and Iteration      

CIT 9.0-9.2

Keywords: in, not, and, or, if, else, elif, for-loops, np.append, accumulator pattern

DISC 3 Querying, Grouping, and Plotting

LAB 2 Data Visualizations and Python Functions

Week 3 – Probability, Simulation, and Sampling

Mon Jul 15

HW 2 DataFrames, Data Visualization, and Functions

Tue Jul 16

LEC 11 Probability
                        

CIT 9.5

Keywords: event, conditional prob., multiplication and addition rules, independence

LEC 12 Simulation      

CIT 9.3-9.4

Keywords: np.random.choice, replacement, np.count_nonzero, coin flipping, Monty Hall

DISC 4 Functions, DataFrames, Control Flow, Probability, and Simulation  

LAB 3 DataFrames, Control Flow, and Probability

Wed Jul 17

LEC 13 Distributions and Sampling      

CIT 10.0-10.4

Keywords: probability vs. empirical distribution, SRS, .sample, parameter, statistic

LEC 14 Bootstrapping and Confidence Intervals      

REV 1 Midterm Review

Thu Jul 18

EXAM Midterm Exam (in person, during lecture) covers Lectures 1-12

Fri Jul 19

HW 3 DataFrames, Control Flow, and Probability

SUR Mid-Quarter Survey

Week 4 – Confidence Intervals, Bootstrapping, and the Normal Distribution

Mon Jul 22

PROJ Midterm Project

Tue Jul 23

LEC 15 Confidence Intervals, Center, and Spread      

CIT 13.3-13.4

Keywords: interpreting CIs, robust vs. sensitive, center, standard deviation, Chebyshev

LEC 16 Standardization and the Normal Distribution      

CIT 14.2-14.3

Keywords: Chebyshev, standard units, normal distribution, CDF, inflection points

DISC 5 Sampling, Bootstrapping, and Confidence Intervals  

LAB 4 Simulation, Sampling, and Bootstrapping

Wed Jul 24

LEC 17 The Central Limit Theorem      

CIT 14.4-14.5

Keywords: distribution of the sample mean, square root law, CLT-based CIs

LEC 18 Choosing Sample Sizes, Statistical Models      

CIT 14.6, CIT 11.1

Keywords: standard deviation of 0s and 1s, np.random.multinomial, Robert Swain jury

HW 4 Simulation, Sampling, and Bootstrapping

Thu Jul 25

LEC 19 Hypothesis Testing      

CIT 11.3

Keywords: null and alternative hypotheses, test statistic, fair or unfair coin

LEC 20 Hypothesis Testing and Total Variation Distance      

CIT 11.2, CITΒ 11.4

Keywords: fair or unfair coin, p-value, midterm exam scores, Alameda County jury, TVD

DISC 6 Standardization, the Normal Distribution, and the Central Limit Theorem

LAB 5 Variability and the Normal Distribution

Fri Jul 26

HW 5 The Normal Distribution and the Central Limit Theorem

Week 5 – Hypothesis Testing, Prediction, and Review

Mon Jul 29

LAB 6 Hypothesis Testing

Tue Jul 30

LEC 21 TVD, Hypothesis Testing, and Permutation Testing      

CIT 12.0-12.1

Keywords: confidence intervals for hypothesis testing, body temperature, smoking/babies

LEC 22 Permutation Testing      

CIT 12.3

Keywords: smoking/babies, np.random.permutation, shuffling, Deflategate

DISC 7 Hypothesis Testing, Total Variation Distance, and Permutation Testing  

Wed Jul 31

LEC 23 Correlation      

CIT 15.0-15.2

Keywords: association, correlation coefficient (r), predicting heights, regression line (su)

LEC 24 Regression and Least Squares      

CIT 15.2-15.4

Keywords: regression line in original units, outliers, errors, RMSE, best fit, least squares

HW 6 Hypothesis Testing and Permutation Testing

Thu Aug 1

LEC 25 Residuals and Inference   

CIT 15.5-16.3

Keywords: residuals, residual plots, patterns, datasaurus dozen, prediction intervals

LEC 26 Review, Conclusion

DISC 8 Regression

PROJ Final Project

Fri Aug 2

REV 2 Final Exam Review (in HDSI 123 and on Zoom)

LAB 7 Regression

Sat Aug 3

FINAL FINAL EXAM, 11:30AM-2:29PM, Mosaic 0204

SUR SETs and End of Quarter Survey