groupby
df.groupby(column_name)
or
df.groupby([column_names])
Groups all DataFrame rows with the same value in column_name or list of column_names.
A groupby operation groups large amounts of data based on the column name(s).
- Input:
- column_name : string
- Groups by the column specified. The column becomes the index.
- column_names : list (of strings)
- Groups by all listed columns, starting with the first one in the list. The columns become the indices.
- Returns:
- A new DataFrame with the parameter column(s) as the index and all other columns grouped.
- Return Type:
- DataFrame
- Note:
- A
groupby()
is usually followed by an aggregate method. Agroupby()
without an aggregate method will return a DataFrameGroupBy object rather than a DataFrame.
.mean()
.median()
.count()
.max()
.min()
.sum()
pets
Index | Species | Color | Weight | Age |
---|---|---|---|---|
0 | dog | black | 40 | 5 |
1 | cat | golden | 15 | 8 |
2 | cat | black | 20 | 9 |
3 | dog | white | 80 | 2 |
4 | dog | black | 25 | 0.5 |
5 | hamster | black | 1 | 3 |
6 | hamster | golden | 0.25 | 0.2 |
.groupby()
with one column
pets.groupby('Species').count()
Index | ID | Color | Weight | Age |
---|---|---|---|---|
cat | 2 | 2 | 2 | 2 |
dog | 3 | 3 | 3 | 3 |
hamster | 2 | 2 | 2 | 2 |
.groupby()
with multiple columns
pets.groupby(['Species', 'Color']).count().reset_index()
Index | Species | Color | ID | Weight | Age | Is_Cat | Owner_Comment |
---|---|---|---|---|---|---|---|
0 | cat | black | 2 | 2 | 2 | 2 | 2 |
1 | cat | golden | 1 | 1 | 1 | 1 | 1 |
2 | dog | black | 2 | 2 | 2 | 2 | 2 |
3 | dog | white | 1 | 1 | 1 | 1 | 1 |
4 | hamster | black | 1 | 1 | 1 | 1 | 1 |
5 | hamster | golden | 1 | 1 | 1 | 1 | 1 |