pandas 多列合并
本教程介绍了如何在 pandas 中使用 dataframe.merge()
方法合并两个 dataframe。
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
student_df = pd.dataframe(
{
"roll no": [500, 501, 503, 504, 505, 506],
"name": ["jennifer", "travis", "bob", "emma", "luna", "anish"],
"gender": ["female", "male", "male", "female", "female", "male"],
"age": [17, 18, 17, 16, 18, 16],
}
)
grades_df = pd.dataframe(
{
"roll no": [501, 502, 503, 504, 505, 506],
"name": ["jennifer", "travis", "bob", "emma", "luna", "anish"],
"grades": ["a", "b ", "a-", "a", "b", "a "],
}
)
print("1st dataframe:")
print(student_df, "\n")
print("2nd dataframe:")
print(grades_df, "\n")
print("merged df:")
print(merged_df)
输出:
1st dataframe:
roll no name gender age
0 500 jennifer female 17
1 501 travis male 18
2 503 bob male 17
3 504 emma female 16
4 505 luna female 18
5 506 anish male 16
2nd dataframe:
roll no name grades
0 501 jennifer a
1 502 travis b
2 503 bob a-
3 504 emma a
4 505 luna b
5 506 anish a
我们将使用 dataframe student_df
和 grades_df
来演示 dataframe.merge()
的工作。
pandas dataframe 不含任何键列的默认合并
如果我们只使用传递两个 dataframes 来合并到 merge()
方法,该方法将收集两个 dataframe 中的所有公共列,并将两个 dataframe 中的每个公共列替换为一个。
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
student_df = pd.dataframe(
{
"roll no": [500, 501, 503, 504, 505, 506],
"name": ["jennifer", "travis", "bob", "emma", "luna", "anish"],
"gender": ["female", "male", "male", "female", "female", "male"],
"age": [17, 18, 17, 16, 18, 16],
}
)
grades_df = pd.dataframe(
{
"roll no": [501, 502, 503, 504, 505, 506],
"name": ["jennifer", "travis", "bob", "emma", "luna", "anish"],
"grades": ["a", "b ", "a-", "a", "b", "a "],
}
)
merged_df = pd.merge(student_df, grades_df)
print("1st dataframe:")
print(student_df, "\n")
print("2nd dataframe:")
print(grades_df, "\n")
print("merged df:")
print(merged_df)
输出:
1st dataframe:
roll no name gender age
0 500 jennifer female 17
1 501 travis male 18
2 503 bob male 17
3 504 emma female 16
4 505 luna female 18
5 506 anish male 16
2nd dataframe:
roll no name grades
0 501 jennifer a
1 502 travis b
2 503 bob a-
3 504 emma a
4 505 luna b
5 506 anish a
merged df:
roll no name gender age grades
0 503 bob male 17 a-
1 504 emma female 16 a
2 505 luna female 18 b
3 506 anish male 16 a
它将合并 dataframe student_df
和 grades_df
,并分配给 merged_df
。我们有两列 roll no
和 name
是两个 dataframe 共有的,但 merge()
函数会将每个通用列合并为一列。
pandas 设置 on
参数的值来指定合并的键值
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
student_df = pd.dataframe(
{
"roll no": [500, 501, 503, 504, 505, 506],
"name": ["jennifer", "travis", "bob", "emma", "luna", "anish"],
"gender": ["female", "male", "male", "female", "female", "male"],
"age": [17, 18, 17, 16, 18, 16],
}
)
grades_df = pd.dataframe(
{
"roll no": [501, 502, 503, 504, 505, 506],
"name": ["jennifer", "travis", "bob", "emma", "luna", "anish"],
"grades": ["a", "b ", "a-", "a", "b", "a "],
}
)
merged_df = pd.merge(student_df, grades_df, on="roll no")
print("1st dataframe:")
print(student_df, "\n")
print("2nd dataframe:")
print(grades_df, "\n")
print("merged df:")
print(merged_df)
输出:
1st dataframe:
roll no name gender age
0 500 jennifer female 17
1 501 travis male 18
2 503 bob male 17
3 504 emma female 16
4 505 luna female 18
5 506 anish male 16
2nd dataframe:
roll no name grades
0 501 jennifer a
1 502 travis b
2 503 bob a-
3 504 emma a
4 505 luna b
5 506 anish a
merged df:
roll no name_x gender age name_y grades
0 501 travis male 18 jennifer a
1 503 bob male 17 bob a-
2 504 emma female 16 emma a
3 505 luna female 18 luna b
4 506 anish male 16 anish a
这里,我们设置 on="roll no"
,merge()
函数将在两个 dataframe 中找到 roll no
命名的列,我们在 merged_df
将会只有一个 roll no
列。虽然 name
列在两个 dataframes 中也是通用的,但由于 name
不作为 on
参数传递,所以我们为左右 dataframe 的 name
列单独设置了一列,分别由 name_x
和 name_y
表示。
使用 left_on
和 right_on
合并 dataframe
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
student_df = pd.dataframe(
{
"roll no": [500, 501, 503, 504, 505, 506],
"name": ["jennifer", "travis", "bob", "emma", "luna", "anish"],
"gender": ["female", "male", "male", "female", "female", "male"],
"age": [17, 18, 17, 16, 18, 16],
}
)
grades_df = pd.dataframe(
{"id": [501, 502, 503, 504, 505, 506], "grades": ["a", "b ", "a-", "a", "b", "a "]}
)
merged_df = pd.merge(student_df, grades_df, left_on="roll no", right_on="id")
print("1st dataframe:")
print(student_df, "\n")
print("2nd dataframe:")
print(grades_df, "\n")
print("merged df:")
print(merged_df)
输出:
1st dataframe:
roll no name gender age
0 500 jennifer female 17
1 501 travis male 18
2 503 bob male 17
3 504 emma female 16
4 505 luna female 18
5 506 anish male 16
2nd dataframe:
id grades
0 501 a
1 502 b
2 503 a-
3 504 a
4 505 b
5 506 a
merged df:
roll no name gender age id grades
0 501 travis male 18 501 a
1 503 bob male 17 503 a-
2 504 emma female 16 504 a
3 505 luna female 18 505 b
4 506 anish male 16 506 a
如果我们要合并的一列在 dataframes 中有不同的列名,我们可以使用 left_on
和 right_on
参数。left_on
将被设置为左边 dataframe 中的列名,right_on
将被设置为右边 dataframe 中的列名。
转载请发邮件至 1244347461@qq.com 进行申请,经作者同意之后,转载请以链接形式注明出处
本文地址:
相关文章
pandas dataframe dataframe.shift() 函数
发布时间:2024/04/24 浏览次数:133 分类:python
-
dataframe.shift() 函数是将 dataframe 的索引按指定的周期数进行移位。
python pandas.pivot_table() 函数
发布时间:2024/04/24 浏览次数:82 分类:python
-
python pandas pivot_table()函数通过对数据进行汇总,避免了数据的重复。
pandas read_csv()函数
发布时间:2024/04/24 浏览次数:254 分类:python
-
pandas read_csv()函数将指定的逗号分隔值(csv)文件读取到 dataframe 中。
pandas loc vs iloc
发布时间:2024/04/24 浏览次数:837 分类:python
-
本教程介绍了如何使用 python 中的 loc 和 iloc 从 pandas dataframe 中过滤数据。
在 python 中将 pandas 系列的日期时间转换为字符串
发布时间:2024/04/24 浏览次数:894 分类:python
-
了解如何在 python 中将 pandas 系列日期时间转换为字符串
在 python pandas 中使用 str.split 将字符串拆分为两个列表列
发布时间:2024/04/24 浏览次数:1124 分类:python
-
本教程介绍如何使用 pandas str.split() 函数将字符串拆分为两个列表列。