pandas 由两列来 groupby-ag捕鱼王app官网

pandas 由两列来 groupby

作者:迹忆客 最近更新:2024/04/23 浏览次数:

本教程介绍了如何在 pandas 中使用 dataframe.groupby() 方法将两列的 dataframe 分成若干组。我们还可以从创建的组中获得更多的信息。

我们将在本文中使用下面的 dataframe。

import pandas as pd
roll_no = [501, 502, 503, 504, 505]
data = pd.dataframe(
    {
        "name": ["jennifer", "travis", "bob", "emma", "luna", "anish"],
        "gender": ["female", "male", "male", "female", "female", "male"],
        "employed": ["yes", "no", "yes", "no", "yes", "no"],
        "age": [30, 28, 27, 24, 28, 25],
    }
)
print(data)

输出:

       name  gender employed  age
0  jennifer  female      yes   30
1    travis    male       no   28
2       bob    male      yes   27
3      emma  female       no   24
4      luna  female      yes   28
5     anish    male       no   25

pandas groupby 多列分组

import pandas as pd
roll_no = [501, 502, 503, 504, 505]
data = pd.dataframe(
    {
        "name": ["jennifer", "travis", "bob", "emma", "luna", "anish"],
        "gender": ["female", "male", "male", "female", "female", "male"],
        "employed": ["yes", "no", "yes", "no", "yes", "no"],
        "age": [30, 28, 27, 24, 28, 25],
    }
)
print(data)
print("")
print("groups in dataframe:")
groups = data.groupby(["gender", "employed"])
for group_key, group_value in groups:
    group = groups.get_group(group_key)
    print(group)
    print("")

输出:

       name  gender employed  age
0  jennifer  female      yes   30
1    travis    male       no   28
2       bob    male      yes   27
3      emma  female       no   24
4      luna  female      yes   28
5     anish    male       no   25
groups in dataframe:
   name  gender employed  age
3  emma  female       no   24
       name  gender employed  age
0  jennifer  female      yes   30
4      luna  female      yes   28
     name gender employed  age
1  travis   male       no   28
5   anish   male       no   25
  name gender employed  age
2  bob   male      yes   27

它从 dataframe 中创建了 4 个组。所有 genderemployed 列值相同的行都会被放在同一个组。


计算每组的行数 pandas

要使用 dataframe.groupby() 方法统计每个创建的组的行数,我们可以使用 size() 方法。

import pandas as pd
roll_no = [501, 502, 503, 504, 505]
data = pd.dataframe(
    {
        "name": ["jennifer", "travis", "bob", "emma", "luna", "anish"],
        "gender": ["female", "male", "male", "female", "female", "male"],
        "employed": ["yes", "no", "yes", "no", "yes", "no"],
        "age": [30, 28, 27, 24, 28, 25],
    }
)
print(data)
print("")
print("count of each group:")
grouped_df = data.groupby(["gender", "employed"]).size().reset_index(name="count")
print(grouped_df)

输出:

       name  gender employed  age
0  jennifer  female      yes   30
1    travis    male       no   28
2       bob    male      yes   27
3      emma  female       no   24
4      luna  female      yes   28
5     anish    male       no   25
count of each group:
   gender employed  count
0  female       no      1
1  female      yes      2
2    male       no      2
3    male      yes      1

它显示 dataframe,从 dataframe 中创建的组,以及每个组的元素数。

如果我们想得到 employed 列中每个值的最大计数值,我们可以从上面创建的组再组成一个组,并对值进行计数,然后使用 max() 方法得到计数的最大值。

import pandas as pd
roll_no = [501, 502, 503, 504, 505]
data = pd.dataframe(
    {
        "name": ["jennifer", "travis", "bob", "emma", "luna", "anish"],
        "gender": ["female", "male", "male", "female", "female", "male"],
        "employed": ["yes", "no", "yes", "no", "yes", "no"],
        "age": [30, 28, 27, 24, 28, 25],
    }
)
print(data)
print("")
groups = data.groupby(["gender", "employed"]).size().groupby(level=1)
print(groups.max())

输出:

       name  gender employed  age
0  jennifer  female      yes   30
1    travis    male       no   28
2       bob    male      yes   27
3      emma  female       no   24
4      luna  female      yes   28
5     anish    male       no   25
employed
no     2
yes    2
dtype: int64

它显示了从 genderemployed 列创建的组中,employed 列值的最大计数。

上一篇:

下一篇:pandas 中 axis 的含义

转载请发邮件至 1244347461@qq.com 进行申请,经作者同意之后,转载请以链接形式注明出处

本文地址:

相关文章

pandas read_csv()函数

发布时间:2024/04/24 浏览次数:254 分类:python

pandas read_csv()函数将指定的逗号分隔值(csv)文件读取到 dataframe 中。

pandas 追加数据到 csv 中

发布时间:2024/04/24 浏览次数:352 分类:python

本教程演示了如何在追加模式下使用 to_csv()向现有的 csv 文件添加数据。

pandas 多列合并

发布时间:2024/04/24 浏览次数:628 分类:python

本教程介绍了如何在 pandas 中使用 dataframe.merge()方法合并两个 dataframes。

pandas loc vs iloc

发布时间:2024/04/24 浏览次数:837 分类:python

本教程介绍了如何使用 python 中的 loc 和 iloc 从 pandas dataframe 中过滤数据。

扫一扫阅读全部技术教程

社交账号
  • https://www.github.com/onmpw
  • qq:1244347461

最新推荐

教程更新

热门标签

扫码一下
查看教程更方便
网站地图