pandas 由两列来 groupby
本教程介绍了如何在 pandas 中使用 dataframe.groupby()
方法将两列的 dataframe 分成若干组。我们还可以从创建的组中获得更多的信息。
我们将在本文中使用下面的 dataframe。
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
data = pd.dataframe(
{
"name": ["jennifer", "travis", "bob", "emma", "luna", "anish"],
"gender": ["female", "male", "male", "female", "female", "male"],
"employed": ["yes", "no", "yes", "no", "yes", "no"],
"age": [30, 28, 27, 24, 28, 25],
}
)
print(data)
输出:
name gender employed age
0 jennifer female yes 30
1 travis male no 28
2 bob male yes 27
3 emma female no 24
4 luna female yes 28
5 anish male no 25
pandas groupby 多列分组
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
data = pd.dataframe(
{
"name": ["jennifer", "travis", "bob", "emma", "luna", "anish"],
"gender": ["female", "male", "male", "female", "female", "male"],
"employed": ["yes", "no", "yes", "no", "yes", "no"],
"age": [30, 28, 27, 24, 28, 25],
}
)
print(data)
print("")
print("groups in dataframe:")
groups = data.groupby(["gender", "employed"])
for group_key, group_value in groups:
group = groups.get_group(group_key)
print(group)
print("")
输出:
name gender employed age
0 jennifer female yes 30
1 travis male no 28
2 bob male yes 27
3 emma female no 24
4 luna female yes 28
5 anish male no 25
groups in dataframe:
name gender employed age
3 emma female no 24
name gender employed age
0 jennifer female yes 30
4 luna female yes 28
name gender employed age
1 travis male no 28
5 anish male no 25
name gender employed age
2 bob male yes 27
它从 dataframe 中创建了 4 个组。所有 gender
和 employed
列值相同的行都会被放在同一个组。
计算每组的行数 pandas
要使用 dataframe.groupby()
方法统计每个创建的组的行数,我们可以使用 size()
方法。
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
data = pd.dataframe(
{
"name": ["jennifer", "travis", "bob", "emma", "luna", "anish"],
"gender": ["female", "male", "male", "female", "female", "male"],
"employed": ["yes", "no", "yes", "no", "yes", "no"],
"age": [30, 28, 27, 24, 28, 25],
}
)
print(data)
print("")
print("count of each group:")
grouped_df = data.groupby(["gender", "employed"]).size().reset_index(name="count")
print(grouped_df)
输出:
name gender employed age
0 jennifer female yes 30
1 travis male no 28
2 bob male yes 27
3 emma female no 24
4 luna female yes 28
5 anish male no 25
count of each group:
gender employed count
0 female no 1
1 female yes 2
2 male no 2
3 male yes 1
它显示 dataframe,从 dataframe 中创建的组,以及每个组的元素数。
如果我们想得到 employed
列中每个值的最大计数值,我们可以从上面创建的组再组成一个组,并对值进行计数,然后使用 max()
方法得到计数的最大值。
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
data = pd.dataframe(
{
"name": ["jennifer", "travis", "bob", "emma", "luna", "anish"],
"gender": ["female", "male", "male", "female", "female", "male"],
"employed": ["yes", "no", "yes", "no", "yes", "no"],
"age": [30, 28, 27, 24, 28, 25],
}
)
print(data)
print("")
groups = data.groupby(["gender", "employed"]).size().groupby(level=1)
print(groups.max())
输出:
name gender employed age
0 jennifer female yes 30
1 travis male no 28
2 bob male yes 27
3 emma female no 24
4 luna female yes 28
5 anish male no 25
employed
no 2
yes 2
dtype: int64
它显示了从 gender
和 employed
列创建的组中,employed
列值的最大计数。
转载请发邮件至 1244347461@qq.com 进行申请,经作者同意之后,转载请以链接形式注明出处
本文地址:
相关文章
pandas dataframe dataframe.shift() 函数
发布时间:2024/04/24 浏览次数:133 分类:python
-
dataframe.shift() 函数是将 dataframe 的索引按指定的周期数进行移位。
python pandas.pivot_table() 函数
发布时间:2024/04/24 浏览次数:82 分类:python
-
python pandas pivot_table()函数通过对数据进行汇总,避免了数据的重复。
pandas read_csv()函数
发布时间:2024/04/24 浏览次数:254 分类:python
-
pandas read_csv()函数将指定的逗号分隔值(csv)文件读取到 dataframe 中。
pandas 多列合并
发布时间:2024/04/24 浏览次数:628 分类:python
-
本教程介绍了如何在 pandas 中使用 dataframe.merge()方法合并两个 dataframes。
pandas loc vs iloc
发布时间:2024/04/24 浏览次数:837 分类:python
-
本教程介绍了如何使用 python 中的 loc 和 iloc 从 pandas dataframe 中过滤数据。
在 python 中将 pandas 系列的日期时间转换为字符串
发布时间:2024/04/24 浏览次数:894 分类:python
-
了解如何在 python 中将 pandas 系列日期时间转换为字符串