# Imports
import babypandas as bpd
import numpy as np
import plotly.express as px
import matplotlib.pyplot as plt
plt.style.use('ggplot')
In addition, we have several other course staff members who are here to support you in discussion, office hours, and online.
Learn more about them at dsc10.com/staff.
Data science is about drawing useful conclusions from data using computation. Throughout the quarter, we'll touch on several aspects of data science:
We've spent the last three years looking at graphs like this:
As of March 2023, both the New York Times and Johns Hopkins have stopped updating their COVID dashboards.
The site The Pudding is home to several interactive data-rich articles.
The above map is called a choropleth. You will create a choropleth of your own this quarter on the Midterm Project!
In addition, you must also fill out the Welcome Survey.
What song was just played?
A. "Don't Look Down (feat. Usher)" by Martin Garrix
B. "Down (feat. Lil Wayne)" by Jay Sean
C. "Sky Is Falling" by Miguel
D. "Down (feat. Kanye West)" by Chris Brown
E. "Coming Down" by The Weeknd
(We are always going to use the same link for Concept Checks, so you should bookmark it.)
There are only two discussion sections:
Discussion starts this Wednesday. Discussion 1 will be focused on how to use DataHub, rather than on problem solving.
We will have two exams this quarter.
babypandas
notes, written specifically for the first part of DSC 10.This is a tough, fast-paced course, but we're here to help you – here's how:
At the end of each quarter, we ask DSC 10 students to give advice to future students in the course. Here are some responses from Winter 2023:
Start the assignments early, every time that I started an assignment the day or even night of, I always struggled and the added pressure of not getting it in on time didn't help me one bit. The times that I started a day or two in advance, even if it was just completing a couple problems in advance, I felt way more relaxed and in turn I learned and retained a lot more.
Pay attention in lectures and to begin both labs and homework early because they will pile up. The lectures are very helpful references to use if you’re stuck during labs and homework’s and office hours are incredibly useful so go!!!
Use TA's and office hours as much as possible, also the reference sheet was crucial.
Office hours are really helpful, all the tutors knew what they were doing and could were able to help me work through any of the problems I got stuck on
Regardless of your background, you can succeed in this course. No prior programming or statistics experience will be assumed!
Watch on YouTube: We’re All Data Scientists | Rebecca Nugent | TEDxCMU.
Counseling and Psychological Services (CAPS) is a campus unit that offers “short term counseling for academic, career, and personal issues and also offers psychiatry services for circumstances when medication can help with counseling.” If you or anyone you know is ever in need of mental health care, you should contact CAPS.
# Read in 'lw.txt' to a variable called little_women_text.
little_women_text = open('data/lw.txt').read()
# See the first three thousand characters.
little_women_text[:3000]
'The Project Gutenberg EBook of Little Women, by Louisa May Alcott\n\nThis eBook is for the use of anyone anywhere at no cost and with\nalmost no restrictions whatsoever. You may copy it, give it away or\nre-use it under the terms of the Project Gutenberg License included\nwith this eBook or online at www.gutenberg.net\n\n\nTitle: Little Women\n\nAuthor: Louisa May Alcott\n\nPosting Date: September 13, 2008 [EBook #514]\nRelease Date: May, 1996\n[This file last updated on August 19, 2010]\n\nLanguage: English\n\n\n*** START OF THIS PROJECT GUTENBERG EBOOK LITTLE WOMEN ***\n\n\n\n\nLITTLE WOMEN\n\n\nby\n\nLouisa May Alcott\n\n\n\n\nCONTENTS\n\n\nPART 1\n\n ONE PLAYING PILGRIMS\n TWO A MERRY CHRISTMAS\n THREE THE LAURENCE BOY\n FOUR BURDENS\n FIVE BEING NEIGHBORLY\n SIX BETH FINDS THE PALACE BEAUTIFUL\n SEVEN AMY\'S VALLEY OF HUMILIATION\n EIGHT JO MEETS APOLLYON\n NINE MEG GOES TO VANITY FAIR\n TEN THE P.C. AND P.O.\n ELEVEN EXPERIMENTS\n TWELVE CAMP LAURENCE\n THIRTEEN CASTLES IN THE AIR\n FOURTEEN SECRETS\n FIFTEEN A TELEGRAM\n SIXTEEN LETTERS\n SEVENTEEN LITTLE FAITHFUL\n EIGHTEEN DARK DAYS\n NINETEEN AMY\'S WILL\n TWENTY CONFIDENTIAL\n TWENTY-ONE LAURIE MAKES MISCHIEF, AND JO MAKES PEACE\n TWENTY-TWO PLEASANT MEADOWS\n TWENTY-THREE AUNT MARCH SETTLES THE QUESTION\n\n\nPART 2\n\n TWENTY-FOUR GOSSIP\n TWENTY-FIVE THE FIRST WEDDING\n TWENTY-SIX ARTISTIC ATTEMPTS\n TWENTY-SEVEN LITERARY LESSONS\n TWENTY-EIGHT DOMESTIC EXPERIENCES\n TWENTY-NINE CALLS\n THIRTY CONSEQUENCES\n THIRTY-ONE OUR FOREIGN CORRESPONDENT\n THIRTY-TWO TENDER TROUBLES\n THIRTY-THREE JO\'S JOURNAL\n THIRTY-FOUR FRIEND\n THIRTY-FIVE HEARTACHE\n THIRTY-SIX BETH\'S SECRET\n THIRTY-SEVEN NEW IMPRESSIONS\n THIRTY-EIGHT ON THE SHELF\n THIRTY-NINE LAZY LAURENCE\n FORTY THE VALLEY OF THE SHADOW\n FORTY-ONE LEARNING TO FORGET\n FORTY-TWO ALL ALONE\n FORTY-THREE SURPRISES\n FORTY-FOUR MY LORD AND LADY\n FORTY-FIVE DAISY AND DEMI\n FORTY-SIX UNDER THE UMBRELLA\n FORTY-SEVEN HARVEST TIME\n\n\n\nCHAPTER ONE\n\nPLAYING PILGRIMS\n\n"Christmas won\'t be Christmas without any presents," grumbled Jo, lying\non the rug.\n\n"It\'s so dreadful to be poor!" sighed Meg, looking down at her old\ndress.\n\n"I don\'t think it\'s fair for some girls to have plenty of pretty\nthings, and other girls nothing at all," added little Amy, with an\ninjured sniff.\n\n"We\'ve got Father and Mother, and each other," said Beth contentedly\nfrom her corner.\n\nThe four young faces on which the firelight shone brightened at the\ncheerful words, but darkened again as Jo said sadly, "We haven\'t got\nFather, and shall not have him for a long time." She didn\'t say\n"perhaps never," but each silently added it, thinking of Father far\naway, where the fighting was.\n\nNobody spoke for a minute; then Meg said in an altered tone, "You know\nthe reason Mother proposed not having any presents this Christmas was\nbecause it is going to b'
# Print the first three thousand characters.
print(little_women_text[:3000])
The Project Gutenberg EBook of Little Women, by Louisa May Alcott This eBook is for the use of anyone anywhere at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included with this eBook or online at www.gutenberg.net Title: Little Women Author: Louisa May Alcott Posting Date: September 13, 2008 [EBook #514] Release Date: May, 1996 [This file last updated on August 19, 2010] Language: English *** START OF THIS PROJECT GUTENBERG EBOOK LITTLE WOMEN *** LITTLE WOMEN by Louisa May Alcott CONTENTS PART 1 ONE PLAYING PILGRIMS TWO A MERRY CHRISTMAS THREE THE LAURENCE BOY FOUR BURDENS FIVE BEING NEIGHBORLY SIX BETH FINDS THE PALACE BEAUTIFUL SEVEN AMY'S VALLEY OF HUMILIATION EIGHT JO MEETS APOLLYON NINE MEG GOES TO VANITY FAIR TEN THE P.C. AND P.O. ELEVEN EXPERIMENTS TWELVE CAMP LAURENCE THIRTEEN CASTLES IN THE AIR FOURTEEN SECRETS FIFTEEN A TELEGRAM SIXTEEN LETTERS SEVENTEEN LITTLE FAITHFUL EIGHTEEN DARK DAYS NINETEEN AMY'S WILL TWENTY CONFIDENTIAL TWENTY-ONE LAURIE MAKES MISCHIEF, AND JO MAKES PEACE TWENTY-TWO PLEASANT MEADOWS TWENTY-THREE AUNT MARCH SETTLES THE QUESTION PART 2 TWENTY-FOUR GOSSIP TWENTY-FIVE THE FIRST WEDDING TWENTY-SIX ARTISTIC ATTEMPTS TWENTY-SEVEN LITERARY LESSONS TWENTY-EIGHT DOMESTIC EXPERIENCES TWENTY-NINE CALLS THIRTY CONSEQUENCES THIRTY-ONE OUR FOREIGN CORRESPONDENT THIRTY-TWO TENDER TROUBLES THIRTY-THREE JO'S JOURNAL THIRTY-FOUR FRIEND THIRTY-FIVE HEARTACHE THIRTY-SIX BETH'S SECRET THIRTY-SEVEN NEW IMPRESSIONS THIRTY-EIGHT ON THE SHELF THIRTY-NINE LAZY LAURENCE FORTY THE VALLEY OF THE SHADOW FORTY-ONE LEARNING TO FORGET FORTY-TWO ALL ALONE FORTY-THREE SURPRISES FORTY-FOUR MY LORD AND LADY FORTY-FIVE DAISY AND DEMI FORTY-SIX UNDER THE UMBRELLA FORTY-SEVEN HARVEST TIME CHAPTER ONE PLAYING PILGRIMS "Christmas won't be Christmas without any presents," grumbled Jo, lying on the rug. "It's so dreadful to be poor!" sighed Meg, looking down at her old dress. "I don't think it's fair for some girls to have plenty of pretty things, and other girls nothing at all," added little Amy, with an injured sniff. "We've got Father and Mother, and each other," said Beth contentedly from her corner. The four young faces on which the firelight shone brightened at the cheerful words, but darkened again as Jo said sadly, "We haven't got Father, and shall not have him for a long time." She didn't say "perhaps never," but each silently added it, thinking of Father far away, where the fighting was. Nobody spoke for a minute; then Meg said in an altered tone, "You know the reason Mother proposed not having any presents this Christmas was because it is going to b
# Create a variable "chapters" by splitting the text on 'CHAPTER '.
chapters = little_women_text.split('CHAPTER ')
# Create a DataFrame with one column - the text of each chapters.
bpd.DataFrame().assign(chapters=chapters)
chapters | |
---|---|
0 | The Project Gutenberg EBook of Little Women, b... |
1 | ONE\n\nPLAYING PILGRIMS\n\n"Christmas won't be... |
2 | TWO\n\nA MERRY CHRISTMAS\n\nJo was the first t... |
3 | THREE\n\nTHE LAURENCE BOY\n\n"Jo! Jo! Where ... |
4 | FOUR\n\nBURDENS\n\n"Oh, dear, how hard it does... |
... | ... |
43 | FORTY-THREE\n\nSURPRISES\n\nJo was alone in th... |
44 | FORTY-FOUR\n\nMY LORD AND LADY\n\n"Please, Mad... |
45 | FORTY-FIVE\n\nDAISY AND DEMI\n\nI cannot feel ... |
46 | FORTY-SIX\n\nUNDER THE UMBRELLA\n\nWhile Lauri... |
47 | FORTY-SEVEN\n\nHARVEST TIME\n\nFor a year Jo a... |
48 rows × 1 columns
# Number of occurrences of each name in each chapter.
counts = bpd.DataFrame().assign(
Amy=np.char.count(chapters, 'Amy'),
Beth=np.char.count(chapters, 'Beth'),
Jo=np.char.count(chapters, 'Jo'),
Meg=np.char.count(chapters, 'Meg'),
Laurie=np.char.count(chapters, 'Laurie'),
)
counts
Amy | Beth | Jo | Meg | Laurie | |
---|---|---|---|---|---|
0 | 0 | 0 | 0 | 0 | 0 |
1 | 23 | 26 | 44 | 26 | 0 |
2 | 13 | 12 | 21 | 20 | 0 |
3 | 2 | 2 | 62 | 36 | 16 |
4 | 14 | 18 | 34 | 17 | 0 |
... | ... | ... | ... | ... | ... |
43 | 31 | 8 | 61 | 3 | 29 |
44 | 13 | 0 | 9 | 0 | 10 |
45 | 1 | 2 | 6 | 2 | 0 |
46 | 2 | 1 | 56 | 4 | 2 |
47 | 10 | 3 | 37 | 6 | 13 |
48 rows × 5 columns
# Cumulative number of times each name appears.
cumulative_counts = bpd.DataFrame().assign(
Amy=np.cumsum(counts.get('Amy')),
Beth=np.cumsum(counts.get('Beth')),
Jo=np.cumsum(counts.get('Jo')),
Meg=np.cumsum(counts.get('Meg')),
Laurie=np.cumsum(counts.get('Laurie')),
Chapter=np.arange(1, 49, 1)
)
cumulative_counts
Amy | Beth | Jo | Meg | Laurie | Chapter | |
---|---|---|---|---|---|---|
0 | 0 | 0 | 0 | 0 | 0 | 1 |
1 | 23 | 26 | 44 | 26 | 0 | 2 |
2 | 36 | 38 | 65 | 46 | 0 | 3 |
3 | 38 | 40 | 127 | 82 | 16 | 4 |
4 | 52 | 58 | 161 | 99 | 16 | 5 |
... | ... | ... | ... | ... | ... | ... |
43 | 619 | 459 | 1435 | 673 | 571 | 44 |
44 | 632 | 459 | 1444 | 673 | 581 | 45 |
45 | 633 | 461 | 1450 | 675 | 581 | 46 |
46 | 635 | 462 | 1506 | 679 | 583 | 47 |
47 | 645 | 465 | 1543 | 685 | 596 | 48 |
48 rows × 6 columns
# Putting it all together, we get a helpful visualization.
cumulative_counts_df = cumulative_counts.drop(columns=['Chapter']).to_df().melt().rename(columns={'variable': 'name', 'value': 'Count'})
cumulative_counts_df = cumulative_counts_df.assign(Chapter=list(range(1, 49)) * 5)
px.line(cumulative_counts_df, x='Chapter', y='Count', color='name', width=900, height=600, title='Cumulative Number of Times Each Name Appears', template='ggplot2')
On Wednesday, we'll start programming in Python 🐍. Remember to bring a laptop or tablet if you have one.
Discussion sections start on Wednesday as well.