pandas dataframe dataframe.merge() 函数-ag捕鱼王app官网

pandas dataframe dataframe.merge() 函数

作者:迹忆客 最近更新:2024/04/22 浏览次数:

python pandas dataframe.merge() 函数合并 dataframe 或命名的 series 对象。


pandas.dataframe.merge() 语法

dataframe.merge(
    right,
    how="inner",
    on=none,
    left_on=none,
    right_on=none,
    left_index=false,
    right_index=false,
    sort=false,
    suffixes="_x",
    "_y",
    copy=true,
    indicator=false,
    validate=none,
)

参数

right dataframe 或命名的 series。要合并的对象
how left, right,innerouter。如何进行合并操作
on 标签或列表。要合并的列或索引名称
left_on 标签或列表。要合并在左侧 dataframe 中的列名或索引名。
right_on 标签或列表。列名或索引名要合并到右边的 dataframe 中。
left_index 布尔型。使用左侧 dataframe 的索引作为连接键(left_index=true)
right_index 布尔型。使用右侧 dataframe 的索引作为连接键(right_index=true)
sort 布尔型。在输出中按字母顺序对连接键进行排序(sort=true)
suffixes 后缀分别应用于左侧和右侧重叠的列名
copy 布尔型。避免复制 copy=false
indicator 在输出的 dataframe 中增加一列名为 _merge 的列,其中包含每行的来源信息(indicator=true),并在输出的 dataframe 中增加一列名为 string 的列(indicator=string)
validate 检查合并是否为指定类型的参数

返回值

它返回一个合并给定对象的 dataframe


示例代码:dataframe.merge() 函数合并两个 dataframe

import pandas as pd
df1 = pd.dataframe(
    {"name": ["suraj", "zeppy", "alish", "sarah"], "working hours": [1, 2, 3, 5]}
)
df2 = pd.dataframe({"name": ["suraj", "zack", "alish", "raphel"], "pay": [5, 6, 7, 8]})
print("1st dataframe:")
print(df1)
print("2nd dataframe:")
print(df2)
merged_df = df1.merge(df2)
print("merged dataframe:")
print(merged_df)

输出:

1st dataframe:
    name  working hours
0  suraj              1
1  zeppy              2
2  alish              3
3  sarah              5
2nd dataframe:
     name  pay
0   suraj    5
1    zack    6
2   alish    7
3  raphel    8
merged dataframe:
    name  working hours  pay
0  suraj              1    5
1  alish              3    7

它使用 sql 的内连接技术将 df1df2 合并为一个 dataframe

对于 inner-join 方法,我们必须确保两个 dataframe 至少有一列是共同的。

在这里,merge() 函数将把具有相同值的公共列的行连接到两个 dataframe


示例代码:在 merge 方法中设置 how 参数,使用各种技术合并 dataframe

import pandas as pd
df1 = pd.dataframe(
    {"name": ["suraj", "zeppy", "alish", "sarah"], "working hours": [1, 2, 3, 5]}
)
df2 = pd.dataframe({"name": ["suraj", "zack", "alish", "raphel"], "pay": [5, 6, 7, 8]})
print("1st dataframe:")
print(df1)
print("2nd dataframe:")
print(df2)
merged_df = df1.merge(df2, how="right")
print("merged dataframe:")
print(merged_df)

输出:

1st dataframe:
    name  working hours
0  suraj              1
1  zeppy              2
2  alish              3
3  sarah              5
2nd dataframe:
     name  pay
0   suraj    5
1    zack    6
2   alish    7
3  raphel    8
merged dataframe:
     name  working hours  pay
0   suraj            1.0    5
1   alish            3.0    7
2    zack            nan    6
3  raphel            nan   8 

它使用 sqlright-join 技术将 df1df2 合并为一个 dataframe

在这里,merge() 函数从右边的 dataframe 返回所有的行。然而,只存在于左侧 dataframe 中的行将得到 nan 值。

同样,我们也可以使用 how 参数的 leftouter 值。


示例代码:在 pandas 中使用 dataframe.merge() 函数只合并特定的列

import pandas as pd
df1 = pd.dataframe(
    {
        "name": ["suraj", "zeppy", "alish", "sarah"],
        "working hours": [1, 2, 3, 5],
        "position": ["salesman", "ceo", "manager", "sales head"],
    }
)
df2 = pd.dataframe(
    {
        "name": ["suraj", "zack", "alish", "raphel"],
        "pay": [5, 6, 7, 8],
        "position": ["salesman", "ceo", "manager", "sales head"],
    }
)
print("1st dataframe:")
print(df1)
print("2nd dataframe:")
print(df2)
merged_df = df1.merge(df2, on="name")
print("merged dataframe:")
print(merged_df)

输出:

1st dataframe:
    name  working hours    position
0  suraj              1    salesman
1  zeppy              2         ceo
2  alish              3     manager
3  sarah              5  sales head
2nd dataframe:
     name  pay    position
0   suraj    5    salesman
1    zack    6         ceo
2   alish    7     manager
3  raphel    8  sales head
merged dataframe:
    name  working hours position_x  pay position_y
0  suraj              1   salesman    5   salesman
1  alish              3    manager    7    manager

它只合并 df1df2name 列。由于默认的连接方法是内连接,因此只有两个 dataframe 的共同行才会被连接。position 列是两个 dataframe 共同的,因此有两个位置列,即 position_xposition_y

默认情况下,_x_y 后缀被附加到重叠列的名称中。我们可以使用 suffixes 参数指定后缀。

df1 = pd.dataframe(
    {
        "name": ["suraj", "zeppy", "alish", "sarah"],
        "working hours": [1, 2, 3, 5],
        "position": ["salesman", "ceo", "manager", "sales head"],
    }
)
df2 = pd.dataframe(
    {
        "name": ["suraj", "zack", "alish", "raphel"],
        "pay": [5, 6, 7, 8],
        "position": ["salesman", "ceo", "manager", "sales head"],
    }
)
print("1st dataframe:")
print(df1)
print("2nd dataframe:")
print(df2)
merged_df = df1.merge(df2, on="name", suffixes=("_left", "_right"))
print("merged dataframe:")
print(merged_df)

输出:

1st dataframe:
    name  working hours    position
0  suraj              1    salesman
1  zeppy              2         ceo
2  alish              3     manager
3  sarah              5  sales head
2nd dataframe:
     name  pay    position
0   suraj    5    salesman
1    zack    6         ceo
2   alish    7     manager
3  raphel    8  sales head
merged dataframe:
    name  working hours position_left  pay position_right
0  suraj              1      salesman    5       salesman
1  alish              3       manager    7        manager

示例代码:使用索引作为连接键来合并 dataframe

import pandas as pd
df1 = pd.dataframe(
    {
        "name": ["suraj", "zeppy", "alish", "sarah"],
        "working hours": [1, 2, 3, 5],
        "position": ["salesman", "ceo", "manager", "sales head"],
    }
)
df2 = pd.dataframe(
    {
        "name": ["suraj", "zack", "alish", "raphel"],
        "pay": [5, 6, 7, 8],
        "position": ["salesman", "ceo", "manager", "sales head"],
    }
)
print("1st dataframe:")
print(df1)
print("2nd dataframe:")
print(df2)
merged_df = df1.merge(
    df2, left_index=true, right_index=true, suffixes=("_left", "_right")
)
print("merged dataframe:")
print(merged_df)

输出:

1st dataframe:
    name  working hours    position
0  suraj              1    salesman
1  zeppy              2         ceo
2  alish              3     manager
3  sarah              5  sales head
2nd dataframe:
     name  pay    position
0   suraj    5    salesman
1    zack    6         ceo
2   alish    7     manager
3  raphel    8  sales head
merged dataframe:
  name_left  working hours position_left name_right  pay position_right
0     suraj              1      salesman      suraj    5       salesman
1     zeppy              2           ceo       zack    6            ceo
2     alish              3       manager      alish    7        manager
3     sarah              5    sales head     raphel    8     sales head

它合并两个 dataframe 的相应行,不考虑列的相似性。如果两个 dataframe 上出现相同的列名,则在合并后将后缀附加到列名上,使之成为不同的列。

转载请发邮件至 1244347461@qq.com 进行申请,经作者同意之后,转载请以链接形式注明出处

本文地址:

相关文章

pandas read_csv()函数

发布时间:2024/04/24 浏览次数:254 分类:python

pandas read_csv()函数将指定的逗号分隔值(csv)文件读取到 dataframe 中。

pandas 追加数据到 csv 中

发布时间:2024/04/24 浏览次数:352 分类:python

本教程演示了如何在追加模式下使用 to_csv()向现有的 csv 文件添加数据。

pandas 多列合并

发布时间:2024/04/24 浏览次数:628 分类:python

本教程介绍了如何在 pandas 中使用 dataframe.merge()方法合并两个 dataframes。

pandas loc vs iloc

发布时间:2024/04/24 浏览次数:837 分类:python

本教程介绍了如何使用 python 中的 loc 和 iloc 从 pandas dataframe 中过滤数据。

扫一扫阅读全部技术教程

社交账号
  • https://www.github.com/onmpw
  • qq:1244347461

最新推荐

教程更新

热门标签

扫码一下
查看教程更方便
网站地图