pandas 多列合并-ag捕鱼王app官网

pandas 多列合并

作者:迹忆客 最近更新:2024/04/24 浏览次数:

本教程介绍了如何在 pandas 中使用 dataframe.merge() 方法合并两个 dataframe。

import pandas as pd
roll_no = [501, 502, 503, 504, 505]
student_df = pd.dataframe(
    {
        "roll no": [500, 501, 503, 504, 505, 506],
        "name": ["jennifer", "travis", "bob", "emma", "luna", "anish"],
        "gender": ["female", "male", "male", "female", "female", "male"],
        "age": [17, 18, 17, 16, 18, 16],
    }
)
grades_df = pd.dataframe(
    {
        "roll no": [501, 502, 503, 504, 505, 506],
        "name": ["jennifer", "travis", "bob", "emma", "luna", "anish"],
        "grades": ["a", "b ", "a-", "a", "b", "a "],
    }
)
print("1st dataframe:")
print(student_df, "\n")
print("2nd dataframe:")
print(grades_df, "\n")
print("merged df:")
print(merged_df)

输出:

1st dataframe:
   roll no      name  gender  age
0      500  jennifer  female   17
1      501    travis    male   18
2      503       bob    male   17
3      504      emma  female   16
4      505      luna  female   18
5      506     anish    male   16 
2nd dataframe:
   roll no      name grades
0      501  jennifer      a
1      502    travis     b 
2      503       bob     a-
3      504      emma      a
4      505      luna      b
5      506     anish     a  

我们将使用 dataframe student_dfgrades_df 来演示 dataframe.merge() 的工作。


pandas dataframe 不含任何键列的默认合并

如果我们只使用传递两个 dataframes 来合并到 merge() 方法,该方法将收集两个 dataframe 中的所有公共列,并将两个 dataframe 中的每个公共列替换为一个。

import pandas as pd
roll_no = [501, 502, 503, 504, 505]
student_df = pd.dataframe(
    {
        "roll no": [500, 501, 503, 504, 505, 506],
        "name": ["jennifer", "travis", "bob", "emma", "luna", "anish"],
        "gender": ["female", "male", "male", "female", "female", "male"],
        "age": [17, 18, 17, 16, 18, 16],
    }
)
grades_df = pd.dataframe(
    {
        "roll no": [501, 502, 503, 504, 505, 506],
        "name": ["jennifer", "travis", "bob", "emma", "luna", "anish"],
        "grades": ["a", "b ", "a-", "a", "b", "a "],
    }
)
merged_df = pd.merge(student_df, grades_df)
print("1st dataframe:")
print(student_df, "\n")
print("2nd dataframe:")
print(grades_df, "\n")
print("merged df:")
print(merged_df)

输出:

1st dataframe:
   roll no      name  gender  age
0      500  jennifer  female   17
1      501    travis    male   18
2      503       bob    male   17
3      504      emma  female   16
4      505      luna  female   18
5      506     anish    male   16 
2nd dataframe:
   roll no      name grades
0      501  jennifer      a
1      502    travis     b 
2      503       bob     a-
3      504      emma      a
4      505      luna      b
5      506     anish     a  
merged df:
   roll no   name  gender  age grades
0      503    bob    male   17     a-
1      504   emma  female   16      a
2      505   luna  female   18      b
3      506  anish    male   16     a 

它将合并 dataframe student_dfgrades_df,并分配给 merged_df。我们有两列 roll noname 是两个 dataframe 共有的,但 merge() 函数会将每个通用列合并为一列。


pandas 设置 on 参数的值来指定合并的键值

import pandas as pd
roll_no = [501, 502, 503, 504, 505]
student_df = pd.dataframe(
    {
        "roll no": [500, 501, 503, 504, 505, 506],
        "name": ["jennifer", "travis", "bob", "emma", "luna", "anish"],
        "gender": ["female", "male", "male", "female", "female", "male"],
        "age": [17, 18, 17, 16, 18, 16],
    }
)
grades_df = pd.dataframe(
    {
        "roll no": [501, 502, 503, 504, 505, 506],
        "name": ["jennifer", "travis", "bob", "emma", "luna", "anish"],
        "grades": ["a", "b ", "a-", "a", "b", "a "],
    }
)
merged_df = pd.merge(student_df, grades_df, on="roll no")
print("1st dataframe:")
print(student_df, "\n")
print("2nd dataframe:")
print(grades_df, "\n")
print("merged df:")
print(merged_df)

输出:

1st dataframe:
   roll no      name  gender  age
0      500  jennifer  female   17
1      501    travis    male   18
2      503       bob    male   17
3      504      emma  female   16
4      505      luna  female   18
5      506     anish    male   16 
2nd dataframe:
   roll no      name grades
0      501  jennifer      a
1      502    travis     b 
2      503       bob     a-
3      504      emma      a
4      505      luna      b
5      506     anish     a  
merged df:
   roll no  name_x  gender  age    name_y grades
0      501  travis    male   18  jennifer      a
1      503     bob    male   17       bob     a-
2      504    emma  female   16      emma      a
3      505    luna  female   18      luna      b
4      506   anish    male   16     anish     a 

这里,我们设置 on="roll no"merge() 函数将在两个 dataframe 中找到 roll no 命名的列,我们在 merged_df 将会只有一个 roll no 列。虽然 name 列在两个 dataframes 中也是通用的,但由于 name 不作为 on 参数传递,所以我们为左右 dataframe 的 name 列单独设置了一列,分别由 name_xname_y 表示。


使用 left_onright_on 合并 dataframe

import pandas as pd
roll_no = [501, 502, 503, 504, 505]
student_df = pd.dataframe(
    {
        "roll no": [500, 501, 503, 504, 505, 506],
        "name": ["jennifer", "travis", "bob", "emma", "luna", "anish"],
        "gender": ["female", "male", "male", "female", "female", "male"],
        "age": [17, 18, 17, 16, 18, 16],
    }
)
grades_df = pd.dataframe(
    {"id": [501, 502, 503, 504, 505, 506], "grades": ["a", "b ", "a-", "a", "b", "a "]}
)
merged_df = pd.merge(student_df, grades_df, left_on="roll no", right_on="id")
print("1st dataframe:")
print(student_df, "\n")
print("2nd dataframe:")
print(grades_df, "\n")
print("merged df:")
print(merged_df)

输出:

1st dataframe:
   roll no      name  gender  age
0      500  jennifer  female   17
1      501    travis    male   18
2      503       bob    male   17
3      504      emma  female   16
4      505      luna  female   18
5      506     anish    male   16 
2nd dataframe:
    id grades
0  501      a
1  502     b 
2  503     a-
3  504      a
4  505      b
5  506     a  
merged df:
   roll no    name  gender  age   id grades
0      501  travis    male   18  501      a
1      503     bob    male   17  503     a-
2      504    emma  female   16  504      a
3      505    luna  female   18  505      b
4      506   anish    male   16  506     a 

如果我们要合并的一列在 dataframes 中有不同的列名,我们可以使用 left_onright_on 参数。left_on 将被设置为左边 dataframe 中的列名,right_on 将被设置为右边 dataframe 中的列名。

转载请发邮件至 1244347461@qq.com 进行申请,经作者同意之后,转载请以链接形式注明出处

本文地址:

相关文章

pandas read_csv()函数

发布时间:2024/04/24 浏览次数:254 分类:python

pandas read_csv()函数将指定的逗号分隔值(csv)文件读取到 dataframe 中。

pandas 追加数据到 csv 中

发布时间:2024/04/24 浏览次数:352 分类:python

本教程演示了如何在追加模式下使用 to_csv()向现有的 csv 文件添加数据。

pandas loc vs iloc

发布时间:2024/04/24 浏览次数:837 分类:python

本教程介绍了如何使用 python 中的 loc 和 iloc 从 pandas dataframe 中过滤数据。

扫一扫阅读全部技术教程

社交账号
  • https://www.github.com/onmpw
  • qq:1244347461

最新推荐

教程更新

热门标签

扫码一下
查看教程更方便
网站地图