groupby
df.groupby(column_name)
or
df.groupby([column_names])
Groups all DataFrame rows with the same value in column_name or list of column_names.
A groupby operation groups large amounts of data based on the column name(s).
- Input:
- column_name : string
- Groups by the column specified. The column becomes the index.
- column_names : list (of strings)
- Groups by all listed columns, starting with the first one in the list. The columns become the indices.
- Returns:
- A new DataFrame with the parameter column(s) as the index and all other columns grouped.
- Return Type:
- DataFrame
- Note:
- A
groupby()is usually followed by an aggregate method. Agroupby()without an aggregate method will return a DataFrameGroupBy object rather than a DataFrame.
.mean() .median() .count() .max() .min() .sum()
The diagram below provides a visualization of how groupby works using a variation of our main dataset. For additional helpful visual guides, please visit the Diagrams site.
pets
| Index | Species | Color | Weight | Age |
|---|---|---|---|---|
| 0 | dog | black | 40 | 5 |
| 1 | cat | golden | 15 | 8 |
| 2 | cat | black | 20 | 9 |
| 3 | dog | white | 80 | 2 |
| 4 | dog | black | 25 | 0.5 |
| 5 | hamster | black | 1 | 3 |
| 6 | hamster | golden | 0.25 | 0.2 |
.groupby() with one column
pets.groupby('Species').count()
| Index | ID | Color | Weight | Age |
|---|---|---|---|---|
| cat | 2 | 2 | 2 | 2 |
| dog | 3 | 3 | 3 | 3 |
| hamster | 2 | 2 | 2 | 2 |
.groupby() with multiple columns
pets.groupby(['Species', 'Color']).count().reset_index()
| Index | Species | Color | ID | Weight | Age | Is_Cat | Owner_Comment |
|---|---|---|---|---|---|---|---|
| 0 | cat | black | 2 | 2 | 2 | 2 | 2 |
| 1 | cat | golden | 1 | 1 | 1 | 1 | 1 |
| 2 | dog | black | 2 | 2 | 2 | 2 | 2 |
| 3 | dog | white | 1 | 1 | 1 | 1 | 1 |
| 4 | hamster | black | 1 | 1 | 1 | 1 | 1 |
| 5 | hamster | golden | 1 | 1 | 1 | 1 | 1 |
Problems or suggestions about this page? Fill out our feedback form.