Data Structures and Algorithms for Data Science
University of California, San Diego, Summer 2023
Triton Maps is a web app for mapping the world, searching for places, and navigating around campus. All these features are powered by data structures and algorithms: programming abstractions designed to represent data and automate processes. In your prior programming experience, you learned how to implement specifications by writing Python programs that used data structures to solve data analysis problems. In this course, you’ll learn how to answer the why question: Why did we write the specification that way in the first place?
There are many decisions to make when designing a software system, and many of these decisions have significant consequences on qualities of a system. In this course, we’ll focus on learning data structures as implementations of abstract data types. An abstract data type describes what you can do with a data type, but not how the data type is implemented; a data structure provides both a description of functionality and a specific implementation of functionality. Rather than think of problems as requiring custom algorithms that we invent from scratch each time, abstract data types help us think of problems as variations that can be addressed by adapting more general algorithm ideas.
Data Structures and Algorithms for Data Science presents a selection of these ideas in a way intended to help you design, analyze, and evaluate software systems. Learning these ideas can enable fuller participation in the community of computing professionals. This 5-week course is organized around 5 weeklong modules culminating in a final portfolio.
Java Programming
Linear Collections
- 7/10
- Design abstractions that hide internal implementation details from clients.
- Explain the class invariants for the dynamic array data structure.
- Write a unit test representing a test case involving a single program.
- 7/11
- Trace the execution of a program with primitive types and reference types.
- Define methods that manipulate nodes at the beginning and end of a list.
- Write a unit test for comparing the output of two programs.
- 7/12
- Compare and contrast runtime analysis, asymptotic analysis, and case analysis.
- Analyze the order of growth of a function as constant, linear, or quadratic.
- Identify big-theta asymptotic notation for the order of growth of a function.
- P1Deques due 7/19
- 7/13
- Describe the problem of sorting, ordering relations, and stability.
- Trace each iteration of selection sort and insertion sort.
- Identify a run of selection sort and insertion sort on an array.
Sorting and Searching
- 7/17
- Analyze the runtime of a recursive algorithm using recurrences.
- Trace the recursive execution of merge sort and quicksort.
- Identify a run of merge sort and quicksort on an array.
- A1Assessment 1 due 7/20
- 7/18
- Describe the isomorphism between search trees and quicksort.
- Identify a best/worst-case height BST insertion order for given elements.
- Trace the search and insertion procedures in a BST.
- 7/19
- Describe the analogy between tries and sorting algorithms.
- Trace the search and insertion procedures in a TST and a trie.
- Explain the TST and trie collection and traversal algorithms.
- P2Autocomplete due 7/26
- 7/20
- Identify element promotions during the 2-3 tree insertion process.
- Identify an insertion order that will increase the height of a 2-3 tree.
- Analyze the best-case and worst-case height of a 2-3 tree.
- D2Sorting and Searching
- Slides
Associative Collections
- 7/24
- Given a 2-3 tree, identify its corresponding LLRB tree (and vice-versa).
- Apply rotations and color flips for a single LLRB tree insertion.
- Using 1-1 correspondence, give the LLRB tree for a series of insertions.
- A2Assessment 2 due 7/27
- 7/25
- Apply sink/swim operations to trace heap element insertion and removal.
- Identify possible binary heap indices for the n-th smallest value.
- Given an array index, find the parent, left child, and right child indexes.
- 7/26
- Explain and trace hash table algorithms such as insertion and search.
- Evaluate how a hash table will behave in response to a given data type.
- Analyze the runtime of a hash table with a given bucket data structure.
- P3Priority Queues due 8/2
- 7/27
- Describe how abstractions can embody values and structure social relations.
- Identify the affordances of a program abstraction such as a class or interface.
- Evaluate affordances by applying the 3 value-sensitive design principles.
- D3Associative Collections
- PFInterview Questions due 8/4
Portfolio
- 7/31
- Trace and explain each data structure in BFS and DFS graph traversal.
- Analyze the runtime of a graph algorithm in terms of vertices and edges.
- Define an appropriate graph abstraction for a given image processing problem.
- A3Assessment 3 due 8/3
- 8/1
- Explain the principle of composition in program design and in system design.
- Explain the limits of monolithic system architecture design.
- Identify data structures used in database, storage, and load-balancing systems.
- 8/2
- ▶Iterative Design
- P4Shortest Paths optional
- 8/3
- Conclusion
- D4Portfolio
Why should we learn?
The education you receive in this course can help prepare you for programming jobs, but this isn’t the only purpose for computing education.1 Education is not only about yourself and your personal gain, but also about all of us and our capacity to live together as a community.
In Seattle, the University of Washington acknowledges the Coast Salish peoples of this land, the land which touches the shared waters of all tribes and bands within the Duwamish, Puyallup, Suquamish, Tulalip and Muckleshoot nations. Among the traditions of the Coast Salish peoples is a value for the connectedness between all living things and a recognition of the unique ways that each of us comes to know something.2
Modern education has the idea that we all need to know the same thing. At the end of the lesson, everyone will know the same thing. That’s why we have tests, that’s why we have quizzes, that’s why we have homework: to ensure we all know the same thing. And that’s powerful—that’s important—within a certain context.
But for native culture, the idea that each listener divines or finds their own answer, their own meaning, their own teaching from the story is equally powerful—that each person needs to be able to look at the world and define it for themselves within their culture and then also find a way to live in that world according to the teachings of their people in their culture.
Likewise, in San Diego, the university was built on the unceded territory of the Kumeyaay Nation. Today, the Kumeyaay people continue to maintain their political sovereignty and cultural traditions as vital members of the San Diego Community.
Our course emphasizes the following values and policies.
- We are responsible for each others’ success
- Everyone has a right to feel like they belong in this course. We’ll need to act with compassion and caring to collaborate with each other. Although we will need more than just unexamined commitments to collaboration, listening, empathy, mindfulness, and caring,3 the following guidelines offer a starting point for ensuring compassion toward each other.4
- Listen with intention to understand first and form an opinion only after you fully understand.
- Take responsibility for the intended and unintended effects of your words and actions on others.
- Mindfully respond to others’ ideas by acknowledging the unique value of each contribution.
We are committed to creating an inclusive learning environment in which individual differences are respected and all students feel comfortable. We expect that you, as a student in this course, will honor and respect your classmates, abiding by the UCSD Principles of Community. Please understand that others’ backgrounds, perspectives and experiences may be different than your own, and help us to build an environment where everyone is respected and able to thrive.
- You should expect and demand to be treated by your classmates and teachers with respect. If you experience any sort of harassment or discrimination, please contact the instructor as soon as possible. If you prefer to speak with someone outside of the course, please contact the Office of Prevention of Harassment and Discrimination.
- We recognize everyone has unique circumstances
- Do not hesitate to contact the instructor by private discussion post or appointment. The sooner we are made aware of your circumstances, the more we can help. Extenuating circumstances include work-school balance, familial responsibilities, religious observations, military duties, unexpected travel, or anything else beyond your control that may negatively impact your performance in the class.
- Students requesting accommodations for this course due to a disability or current functional limitation must provide a current Authorization for Accommodation (AFA) letter issued by the Office for Students with Disabilities (OSD). This AFA letter should be shared with the instructor and the Data Science OSD Liaison, who can be reached at dscstudent@ucsd.edu. Please contact them as soon as possible to ensure accommodations can be arranged as needed.
- We believe everyone wants to learn
- Education is about shaping your identity as much as it is about learning things. In school, the consequences of making mistakes are relatively small. But the habits you form now—repeated over days, weeks, months, or years—determine who you will be in the future. Now is the best time to practice honest habits.
- We ask that you do not claim to be responsible for work that is not yours. When you receive substantial help from someone else, include a citation. Don’t request a copy of someone else’s work, don’t provide your work to another student, and don’t post your solutions publicly.
- Academic honesty reflects the trust (or the lack thereof) between students and teachers. We do our best to design the course in ways that ensure trust, but we know our systems are not perfect. If you submit work in violation of these policies but bring it to the attention of the instructor within 72 hours, you may resubmit your own work without further consequence. Rather than blame students, we want to fix or replace broken systems that compel students to lose trust.
- We ask that you do not claim to be responsible for work that is not yours. When you receive substantial help from someone else, include a citation. Don’t request a copy of someone else’s work, don’t provide your work to another student, and don’t post your solutions publicly.
How will you learn?
In a traditional classroom, you attend class while a teacher lectures until time is up. Then, you go home and do the important work of applying concepts toward practice problems or assignments on your own. Finally, you take an exam to show what you know.
Today, we know that there are more effective ways to learn science, engineering, and mathematics.5 Learning skills like software engineering and algorithm analysis requires deliberate practice: a learning cycle that starts with sustained motivation, then presents tasks that build on prior knowledge, and concludes with immediate, personalized feedback. Each module in the course will involve several different activities that are designed so that we can make the most of our class time together.
- On your own before class, prepare for learning by completing the pre-class preparation.
- In Canvas, open the weekly Lessons and add an annotation or reply to an annotation.
Lessons are designed to introduce all the concepts for the entire week’s classes.
- In your team during class, collaborate on the in-class guided practice.
- In Canvas, tutors will observe your learning and record your Teamwork.
Class meetings are designed to deepen and connect ideas across the course.
- On your own after class, record a video showing what you learned in the past two weeks.
- In Canvas, demonstrate your learning by explaining your work for each Assessment.
Your videos not only demonstrate problem solving skills, but also communication skills.
- In your team throughout the module, apply your learning by collaborating on the project.
- In Canvas, demonstrate your learning by assembling a Project Presentation video.
Projects are designed to situate what you learned in real-world practice.
Communicating your ideas and explaining your problem solving process is important in this course. We ask students to record their own screenshared and voiced videos because they provide rich information about your solution process and authenticate your assessment. But we know that visual or voiceover presentations are not accessible or equitable for everyone. If for any reason a voiceover presentation won’t work for you, we would be happy to work with you to find a better way to communicate your ideas and explain your problem solving process. You do not need to disclose why you feel uncomfortable, but we were thinking about people with vision impairment, gender and voice dysphoria, limited access to resources, or complicated living situations when designing this policy.
Expect to spend 4 hours in class and 8 hours outside of class working on this course. Some weeks may require more or less time than other weeks. If you find the workload is significantly exceeding this expectation, talk to your TA.
- 9:30 AM
- 10:00 AM
- 10:30 AM
- 11:00 AM
- 11:30 AM
- 12:00 PM
- 12:30 PM
- 1:00 PM
- 1:30 PM
- 2:00 PM
- 2:30 PM
- 3:00 PM
- 3:30 PM
- 4:00 PM
- 4:30 PM
Sunday
- Office Hours9:30 AM–11:00 AMZoom
Monday
- Office Hours9:30 AM–11:00 AMFAH Atrium + Zoom
- Lecture11:00 AM–12:30 PMFAH 1101
- Office Hours12:30 PM–2:00 PMFAH Atrium + Zoom
Tuesday
- Office Hours9:30 AM–11:00 AMFAH Atrium + Zoom
- Lecture11:00 AM–12:30 PMFAH 1101
- Office Hours2:00 PM–3:30 PMFAH Atrium + Zoom
Wednesday
- Office Hours9:30 AM–11:00 AMFAH Atrium + Zoom
- Lecture11:00 AM–12:30 PMFAH 1101
- Office Hours12:30 PM–3:30 PMFAH Atrium + Zoom
Thursday
- Office Hours9:30 AM–11:00 AMFAH Atrium + Zoom
- Lecture11:00 AM–12:30 PMFAH 1101
- Office Hours12:30 PM–2:00 PMFAH Atrium + Zoom
- Discussion2:00 PM–4:00 PMWLH 2204
- Office Hours4:00 PM–5:00 PMWLH 2204 + Zoom
Friday
- Office Hours9:30 AM–11:00 AMZoom
How is this course graded?
Grading in this course encourages learning through deliberate practice by emphasizing revision and resubmission of work. Most of the coursework is designed around feedback loops where you try something, get feedback, then try again. Grades are based on what you eventually learn through this process.
- C level work
- Completion of most requirements in the Java Programming module.
- Completion of most requirements in the Linear Collections module.
- Completion of most requirements in the Sorting and Searching module.
- Completion of most requirements in the Linear Collections module.
- B level work
- Completion of most requirements in the Java Programming module.
- Completion of most requirements in the Linear Collections module.
- Completion of most requirements in the Sorting and Searching module.
- Completion of most requirements in the Associative Collections module.
- Completion of most requirements in the Portfolio module.
- Completion of most requirements in the Linear Collections module.
- A level work
- Completion of all requirements in the Java Programming module.
- Completion of all requirements in the Linear Collections module.
- Completion of all requirements in the Sorting and Searching module.
- Completion of all requirements in the Associative Collections module.
- Highest marks across all parts of the Portfolio.
- Completion of all requirements in the Linear Collections module.
Mark Guzdial. 2021. Computer science was always supposed to be taught to everyone, and it wasn’t about getting a job. ↩
Roger Fernandes. 2012. Roger Fernandes: Artist/Storyteller/Educator. ↩
Brian Arao and Kristi Clemens. 2013. “From Safe Spaces to Brave Spaces: A New Way to Frame Dialogue Around Diversity and Social Justice” in The Art of Effective Facilitation. ↩
Asao B. Inoue. 2019. “Sample Charter for Compassion” in Labor-Based Grading Contracts: Building Equity and Inclusion in the Compassionate Writing Classroom. ↩
Scott Freeman, Sarah L. Eddy, Miles McDonough, Michelle K. Smith, Nnadozie Okoroafor, Hannah Jordt, and Mary Pat Wenderoth. 2014. Active learning increases student performance in science, engineering, and mathematics. ↩
Explore DSC 30
- Assessments - Evaluate and explain your learning.
- Lessons - Learning materials and resources.
- Projects - Project instructions and specifications.
- Staff - All the teaching and learning assistants.
- Style - Guidelines to aid code understanding.