groupby

df.groupby(column_name) or df.groupby([column_names])

Groups all DataFrame rows with the same value in column_name or list of column_names.

A groupby operation groups large amounts of data based on the column name(s).

Input:: column_name : string; Groups by the column specified. The column becomes the index.; column_names : list (of strings); Groups by all listed columns, starting with the first one in the list. The columns become the indices.
Returns:: A new DataFrame with the parameter column(s) as the index and all other columns grouped.
Return Type:: DataFrame
Note:: A groupby() is usually followed by an aggregate method. A groupby() without an aggregate method will return a DataFrameGroupBy object rather than a DataFrame.

The diagram below provides a visualization of how groupby works using a variation of our main dataset. For additional helpful visual guides, please visit the Diagrams site.

(Source)

pets

Index	Species	Color	Weight	Age
0	dog	black	40	5
1	cat	golden	15	8
2	cat	black	20	9
3	dog	white	80	2
4	dog	black	25	0.5
5	hamster	black	1	3
6	hamster	golden	0.25	0.2

.groupby() with one column

pets.groupby('Species').count()

Index	ID	Color	Weight	Age
cat	2	2	2	2
dog	3	3	3	3
hamster	2	2	2	2

.groupby() with multiple columns

pets.groupby(['Species', 'Color']).count().reset_index()

Index	Species	Color	ID	Weight	Age	Is_Cat	Owner_Comment
0	cat	black	2	2	2	2	2
1	cat	golden	1	1	1	1	1
2	dog	black	2	2	2	2	2
3	dog	white	1	1	1	1	1
4	hamster	black	1	1	1	1	1
5	hamster	golden	1	1	1	1	1

Problems or suggestions about this page? Fill out our feedback form.