groupby
df.groupby(column_name)
or
df.groupby([column_names])
Groups all DataFrame rows with the same value in column_name or list of column_names.
A groupby operation groups large amounts of data based on the column name(s).
- Input:
- column_name : string
- Groups by the column specified. The column becomes the index.
- column_names : list (of strings)
- Groups by all listed columns, starting with the first one in the list. The columns become the indices.
- Returns:
- A new DataFrame with the parameter column(s) as the index and all other columns grouped.
- Return Type:
- DataFrame
- Clarification (groupby / pandas 2.0):
- Pandas 2.0+ no longer silently drops columns that can’t be aggregated after a groupby. We must use
.get()to select the column(s) we want before.groupby(...).mean()(or other aggregations) so that our code runs properly on current pandas. - Note:
- A
groupby()is usually followed by an aggregate method. Agroupby()without an aggregate method will return a DataFrameGroupBy object rather than a DataFrame.
.mean() .median() .count() .max() .min() .sum()
The diagram below provides a visualization of how groupby works using a variation of our main dataset. For additional helpful visual guides, please visit the Diagrams site.
pets
| Index | ID | Species | Color | Weight | Age | Is_Cat | Owner_Comment |
|---|---|---|---|---|---|---|---|
| 0 | dog_001 | dog | black | 40 | 5 | False | There are no bad dogs, only bad owners. |
| 1 | cat_001 | cat | golden | 1.5 | 0.2 | True | My best birthday present ever!!! |
| 2 | cat_002 | cat | black | 15 | 9 | True | ****All you need is love and a cat.**** |
| 3 | dog_002 | dog | white | 80 | 2 | False | Love is a wet nose and a wagging tail. |
| 4 | dog_003 | dog | black | 25 | 0.5 | False | Be the person your dog thinks you are. |
| 5 | ham_001 | hamster | black | 1 | 3 | False | No, thank you! |
| 6 | ham_002 | hamster | golden | 0.25 | 0.2 | False | No, thank you! |
| 7 | cat_003 | cat | black | 10 | 0 | True | No, thank you! |
.groupby() with one column
pets.groupby('Species').count()
| Index | ID | Color | Weight | Age | Is_Cat | Owner_Comment |
|---|---|---|---|---|---|---|
| cat | 3 | 3 | 3 | 3 | 3 | 3 |
| dog | 3 | 3 | 3 | 3 | 3 | 3 |
| hamster | 2 | 2 | 2 | 2 | 2 | 2 |
.groupby() with multiple columns
pets.groupby(['Species', 'Color']).count().reset_index()
| Index | Species | Color | ID | Weight | Age | Is_Cat | Owner_Comment |
|---|---|---|---|---|---|---|---|
| 0 | cat | black | 2 | 2 | 2 | 2 | 2 |
| 1 | cat | golden | 1 | 1 | 1 | 1 | 1 |
| 2 | dog | black | 2 | 2 | 2 | 2 | 2 |
| 3 | dog | white | 1 | 1 | 1 | 1 | 1 |
| 4 | hamster | black | 1 | 1 | 1 | 1 | 1 |
| 5 | hamster | golden | 1 | 1 | 1 | 1 | 1 |
See Also
count() · mean() · sum() · df.assign()
Problems or suggestions about this page? Fill out our feedback form.