Lecture 9 – Functions and Apply

DSC 10, Fall 2022

Announcements

Agenda

Reminder: Use the DSC 10 Reference Sheet. You can also use it on exams!

Functions

Defining functions

Motivation

Suppose you drive to a restaurant 🥘 in LA, located exactly 100 miles away.

$$\text{average speed} = \frac{\text{distance}}{\text{time}} = \frac{50 + 50}{\text{time}_1 + \text{time}_2} \text{ miles per hour}$$

In segment 1, when you drove 50 miles at 80 miles per hour, you drove for $\frac{50}{80}$ hours:

$$\text{speed}_1 = \frac{\text{distance}_1}{\text{time}_1}$$
$$80 \text{ miles per hour} = \frac{50 \text{ miles}}{\text{time}_1} \implies \text{time}_1 = \frac{50}{80} \text{ hours}$$

Similarly, in segment 2, when you drove 50 miles at 60 miles per hour, you drove for $\text{time}_2 = \frac{50}{60} \text{ hours}$.

Then,

$$\text{average speed} = \frac{50 + 50}{\frac{50}{80} + \frac{50}{60}} \text{ miles per hour} $$
$$\begin{align*}\text{average speed} &= \frac{50}{50} \cdot \frac{1 + 1}{\frac{1}{80} + \frac{1}{60}} \text{ miles per hour} \\ &= \frac{2}{\frac{1}{80} + \frac{1}{60}} \text{ miles per hour} \end{align*}$$

Example: Harmonic mean

The harmonic mean ($\text{HM}$) of two positive numbers, $a$ and $b$, is defined as

$$\text{HM} = \frac{2}{\frac{1}{a} + \frac{1}{b}}$$

It is often used to find the average of multiple rates.

Finding the harmonic mean of 80 and 60 is not hard:

But what if we want to find the harmonic mean of 80 and 70? 80 and 90? 20 and 40? This would require a lot of copy-pasting, which is prone to error.

It turns out that we can define our own "harmonic mean" function just once, and re-use it multiple times.

Note that we only had to specify how to calculate the harmonic mean once!

Functions

Functions are a way to divide our code into small subparts to prevent us from writing repetitive code. Each time we define our own function in Python, we will use the following pattern.

Functions are "recipes"

Parameters and arguments

triple has one parameter, x.

When we call triple with the argument 5, you can pretend that there's an invisible first line in the body of triple that says x = 5.

Note that arguments can be of any type!

Functions can take 0 or more arguments

Functions can have any number of arguments. So far, we've created a function that takes two arguments – harmonic_mean – and a function that takes one argument – triple.

greeting takes no arguments!

Functions don't run until you call them!

The body of a function is not run until you use (call) the function.

Here, we can define where_is_the_error without seeing an error message.

It is only when we call where_is_the_error that Python gives us an error message.

Example: first_name

Let's create a function called first_name that takes in someone's full name and returns their first name. Example behavior is shown below.

>>> first_name('Pradeep Khosla')
'Pradeep'

Hint: Use the string method .split.

General strategy for writing functions:

  1. First, try and get the behavior to work on a single example.
  2. Then, encapsulate that behavior inside a function.

Returning

Returning

Once a function executes a return statement, it stops running.

Scope 🩺

The names you choose for a function’s parameters are only known to that function (known as local scope). The rest of your notebook is unaffected by parameter names.

Applying functions to DataFrames

DSC 10 student data

The DataFrame roster contains the names and lecture sections of all students enrolled in DSC 10 this quarter. The first names are real, while the last names have been anonymized for privacy.

Example: Common first names

What is the most common first name among DSC 10 students? (Any guesses?)

Using our first_name function

Somehow, we need to call first_name on every student's 'name'.

Ideally, there's a better solution than doing this 411 times...

.apply


df.get(column_name).apply(function_name)

Example: Common first names

Activity

Below:

Note: .apply works with built-in functions, too!

For instance, to find the length of each name, we might use the len function:

Aside: what if names are in the index?

We were able to apply first_name to the 'name' column because it's a Series. The .apply method doesn't work on the index, because the index is not a Series.

Solution: .reset_index()

Use .reset_index() to turn the index of a DataFrame into a column, and to reset the index back to the default of 0, 1, 2, 3, and so on.

Example: Shared first names and sections

For example, maybe 'Ryan Ufhwdl' wants to see if there's another 'Ryan' in their section.

Strategy:

  1. What section is 'Ryan Ufhwdl' in?
  2. How many people are in that section and named 'Ryan'?

Another function: shared_first_and_section

Let's create a function named shared_first_and_section. It will take in the full name of a student and return the number of students in their section with the same first name and section (including them).

Note: This is the first function we're writing that involves using a DataFrame within the function – this is fine!

Now, let's add a column to with_first that contains the values returned by shared_first_and_section.

Let's look at all the students who are in a section with someone that has the same first name as them.

We can narrow this down to a particular lecture section if we'd like.

Sneak peek

While the DataFrames on the previous slide contain the info we were looking for, they're not organized very conveniently. For instance, there are three rows containing the fact that there are 3 'Andrew's in the 10AM lecture section.

Wouldn't it be great if we could create a DataFrame like the one below? We'll see how on Friday!

section first count
0 10AM Andrew 3
1 1PM Ethan 3
2 1PM Samuel 3
3 10AM Kevin 2
4 11AM Connor 2

Activity

Find the longest first name in the class that is shared by at least two students in the same section.

Hint: You'll have to use both assign and apply.

Summary, next time

Summary

Next time

More advanced DataFrame manipulations!