Principles of Data Science
DSC 10, Summer 2024 at UC San Diego
Nishant Kheterpalhe/him/his
Lecture: TuTh 11AM-12:50PM Mosaic 0204, W 11AM-12:50PM Mandeville Hall B-104
The Final Exam is this Saturday, August 3rd from 11:30-2:30PM in Mosaic 0204.
Our final review session will be held Friday, Aug 2nd, 12-2PM in HDSI 123 and at this Zoom link.
If at least 75% of the class fills out both SETs and the internal End-of-Quarter Survey, then the entire class will have 1% of extra credit added to their overall grade. The deadline is Saturday, August 3rd at 8AM.
Week 1 β Python Basics and DataFrames
- Tue Jul 2
- Keywords: data science, course structure, policies, syllabus, Little Women demo
LEC 2 Expressions and Data Types
Keywords: Jupyter notebooks, expressions, variables, assignment, functions, int, floatDISC 1 Getting Started with Jupyter Notebooks
SUR Welcome Survey
- Wed Jul 3
LEC 3 Strings, Lists, and Arrays
Keywords: string methods, mean, median, lists, arrays, array arithmetic- Keywords: array methods, np.arange, .read_csv, .get, .assign, .sort_values, .iloc, .loc, index
- Keywords: array methods, np.arange, .read_csv, .get, .assign, .sort_values, .iloc, .loc, index
- Thu Jul 4
No Lecture (Independence Day)
Week 2 β DataFrames and Data Visualization
- Mon Jul 8
LAB 1 Arrays and DataFrames
- Tue Jul 9
- Keywords: .set_index, Booleans, querying, .shape, &, |, .take, .groupby, aggregation
LEC 6 Grouping and Data Visualization
Keywords: .groupby, numerical vs. categorical, scatter plot, line plot, bar chart - Wed Jul 10
LEC 7 Distributions and Histograms
Keywords: distributions, density histograms, binning, total area, overlaid plots- Keywords: functions, arguments, print vs. return, .apply, .reset_index
QUIZ 1 Quiz 1 covers Lectures 1-4 (including Example 4 from Lecture 5)
- Keywords: functions, arguments, print vs. return, .apply, .reset_index
- Thu Jul 11
LEC 9 Grouping on Multiple Columns, Merging
Keywords: .groupby([col_1, col_2, β¦]), subgroups, MultiIndex, .merge, number of rowsLEC 10 Conditional Statements and Iteration
Keywords: in, not, and, or, if, else, elif, for-loops, np.append, accumulator patternDISC 3 Querying, Grouping, and Plotting
Week 3 β Probability, Simulation, and Sampling
- Mon Jul 15
- Tue Jul 16
- Keywords: event, conditional prob., multiplication and addition rules, independence
- Keywords: np.random.choice, replacement, np.count_nonzero, coin flipping, Monty Hall
DISC 4 Functions, DataFrames, Control Flow, Probability, and Simulation
- Keywords: np.random.choice, replacement, np.count_nonzero, coin flipping, Monty Hall
- Wed Jul 17
LEC 13 Distributions and Sampling
Keywords: probability vs. empirical distribution, SRS, .sample, parameter, statisticLEC 14 Bootstrapping and Confidence Intervals
REV 1 Midterm Review
- Thu Jul 18
EXAM Midterm Exam (in person, during lecture) covers Lectures 1-12
- Fri Jul 19
HW 3 DataFrames, Control Flow, and Probability
SUR Mid-Quarter Survey
Week 4 β Confidence Intervals, Bootstrapping, and the Normal Distribution
- Mon Jul 22
PROJ Midterm Project
- Tue Jul 23
LEC 15 Confidence Intervals, Center, and Spread
Keywords: interpreting CIs, robust vs. sensitive, center, standard deviation, ChebyshevLEC 16 Standardization and the Normal Distribution
Keywords: Chebyshev, standard units, normal distribution, CDF, inflection pointsDISC 5 Sampling, Bootstrapping, and Confidence Intervals
- Wed Jul 24
LEC 17 The Central Limit Theorem
Keywords: distribution of the sample mean, square root law, CLT-based CIsLEC 18 Choosing Sample Sizes, Statistical Models
Keywords: standard deviation of 0s and 1s, np.random.multinomial, Robert Swain jury- Thu Jul 25
- Keywords: null and alternative hypotheses, test statistic, fair or unfair coin
LEC 20 Hypothesis Testing and Total Variation Distance
Keywords: fair or unfair coin, p-value, midterm exam scores, Alameda County jury, TVDDISC 6 Standardization, the Normal Distribution, and the Central Limit Theorem
- Fri Jul 26
Week 5 β Hypothesis Testing, Prediction, and Review
- Mon Jul 29
LAB 6 Hypothesis Testing
- Tue Jul 30
LEC 21 TVD, Hypothesis Testing, and Permutation Testing
Keywords: confidence intervals for hypothesis testing, body temperature, smoking/babies- Keywords: smoking/babies, np.random.permutation, shuffling, Deflategate
DISC 7 Hypothesis Testing, Total Variation Distance, and Permutation Testing
- Keywords: smoking/babies, np.random.permutation, shuffling, Deflategate
- Wed Jul 31
- Keywords: association, correlation coefficient (r), predicting heights, regression line (su)
LEC 24 Regression and Least Squares
Keywords: regression line in original units, outliers, errors, RMSE, best fit, least squares - Thu Aug 1
LEC 25 Residuals and Inference
Keywords: residuals, residual plots, patterns, datasaurus dozen, prediction intervalsLEC 26 Review, Conclusion
DISC 8 Regression
PROJ Final Project
- Fri Aug 2
REV 2 Final Exam Review (in HDSI 123 and on Zoom)
LAB 7 Regression
- Sat Aug 3
FINAL FINAL EXAM, 11:30AM-2:29PM, Mosaic 0204