Lecture 28 – Review, Conclusion

DSC 10, Spring 2023

Announcements

Agenda

More review

From the Winter 2023 Final:

From the Winter 2023 Final:

From the Winter 2023 Final:

From the Winter 2023 Final:

From the Fall 2022 Final:

From the Fall 2022 Final:

From the Fall 2022 Final:

From the Fall 2022 Final:

From the Fall 2022 Final:

Personal projects

Using Jupyter Notebooks after DSC 10

Finding data

These sites allow you to search for datasets (in CSV format) from a variety of different domains. Some may require you to sign up for an account; these are generally reputable sources.

Note that all of these links are also available at rampure.org/find-datasets.

Domain-specific sources of data

Tip: if a site only allows you to download a file as an Excel file, not a CSV file, you can download it, open it in a spreadsheet viewer (Excel, Numbers, Google Sheets), and export it to a CSV.

Join a DS3 Project Group 🤝

The Data Science Student Society organizes project groups, which are a great way to get experience and build your resume. Keep your eye out for applications!

Demo: Gapminder 🌎

plotly

Gapminder dataset

Gapminder Foundation is a non-profit venture registered in Stockholm, Sweden, that promotes sustainable global development and achievement of the United Nations Millennium Development Goals by increased use and understanding of statistics and other information about social, economic and environmental development at local, national and global levels. - Gapminder Wikipedia

The dataset contains information for each country for several different years.

Let's start by just looking at 2007 data (the most recent year in the dataset).

Scatter plot

We can plot life expectancy vs. GDP per capita. If you hover over a point, you will see the name of the country.

In future courses, you'll learn about transformations. Here, we'll apply a log transformation to the x-axis to make the plot look a little more linear.

Animated scatter plot

We can take things one step further.

Watch this video if you want to see an even-more-animated version of this plot.

Animated histogram

Choropleth

Parting thoughts

From Lecture 1: What is "data science"?

Data science is about drawing useful conclusions from data using computation. Throughout the quarter, we touched on several aspects of data science:

Note on grades

Suraj's freshman year transcript.

Don't let your grades define you, they don't tell the full story.

Thank you!

This course would not have been possible without...

Good luck on your finals...

...and see you tomorrow at 7PM 😊.