Skip to main content

groupby

df.groupby(column_name) or df.groupby([column_names])

Groups all DataFrame rows with the same value in column_name or list of column_names.

A groupby operation groups large amounts of data based on the column name(s).

Input:
column_name : string
Groups by the column specified. The column becomes the index.
column_names : list (of strings)
Groups by all listed columns, starting with the first one in the list. The columns become the indices.
Returns:
A new DataFrame with the parameter column(s) as the index and all other columns grouped.
Return Type:
DataFrame
Note:
A groupby() is usually followed by an aggregate method. A groupby() without an aggregate method will return a DataFrameGroupBy object rather than a DataFrame.

Aggregate Methods
.mean()   .median()   .count()   .max()   .min()   .sum()


pets
IndexSpeciesColorWeightAge
0dogblack405
1catgolden158
2catblack209
3dogwhite802
4dogblack250.5
5hamsterblack13
6hamstergolden0.250.2

.groupby() with one column

pets.groupby('Species').count()
IndexIDColorWeightAge
cat2222
dog3333
hamster2222

.groupby() with multiple columns

pets.groupby(['Species', 'Color']).count().reset_index()
IndexSpeciesColorIDWeightAgeIs_CatOwner_Comment
0catblack22222
1catgolden11111
2dogblack22222
3dogwhite11111
4hamsterblack11111
5hamstergolden11111