Pandas join on index – Pandas : How to merge Dataframes by index using Dataframe.merge()

How to merge Dataframes by index using Dataframe.merge() in Python ?

Pandas join on index: In this article we are going to see how we can merge two dataframes by using index of both the dataframes or by suing index of one dataframe and some columns of the other dataframe, and how we can keep a merged dataframe with similar indices. So, let’s start the exploring the topic.

DataFrame.merge()

SYNTAX :

DataFrame.merge(right, how='inner', on=None, left_on=None, right_on=None, index_left=False, index_right=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)

For this short example, we will only focus on some arguments :

  • On : This is the column name on which the merge is supposed to be done.
  • Left_on : The column names of the left dataframe which are to be merged
  • Right_on : The column names of the right dataframe which are to be merged
  • index_left : It takes a boolean value whose default values is false. If it is true then it will choose indices from the left dataframe as join key.
  • index_right : It takes a boolean value whose default values is false. If it is true then it will choose indices from the right dataframe as join key.

To demonstrate we will be taking the following two dataframes :

Left dataframe :

     Regd     Name    Age        City         Exp
0    10        Jill          16.0     Tokyo          10
1    11        Rachel    38.0     Texas           5
2    12        Kirti        39.0     New York     7
3    13        Veena     40.0     Texas          21
4    14        Lucifer    NaN    Texas          30
5    15        Pablo      30.0    New York     7
6    16        Lionel     45.0    Colombia    11

Right dataframe :

     Regd     Exp       Wage     Bonus
0    10        Junior   75000      2000
1    11        Senior   72200     1000
2    12        Expert   90999     1100
3    13        Expert   90000     1000
4    14        Junior   20000      2000
5    15        Junior   50000      1500
6    16        Senior   81000     1000

Merging two Dataframes by index of both the dataframes :

Dataframe merge on index: Here you might have noticed we have a common column named ‘Regd’ . So we can merge both the dataframes by passing left_index and right_index as true in the function.

# Program :


# Importing the module
import pandas as pd
import numpy as np

#Left Dataframe
students = [(10,'Jill',    16,     'Tokyo',    10),
            (11,'Rachel',  38,     'Texas',     5),
            (12,'Kirti',   39,     'New York',  7),
            (13,'Veena',   40,     'Texas',    21),
            (14,'Lucifer', np.NaN, 'Texas',    30),
            (15,'Pablo',   30,     'New York',  7),
            (16,'Lionel',  45,     'Colombia', 11) ]
lDfObj = pd.DataFrame(students, columns=['Regd','Name','Age','City','Exp'],index=['a', 'b', 'c', 'd', 'e', 'f', 'g'])

#Right dataframe
wage = [(10, 'Junior', 75000, 2000) ,
        (11, 'Senior', 72200, 1000) ,
        (12, 'Expert', 90999, 1100) ,
        (13, 'Expert', 90000, 1000) ,
        (14, 'Junior', 20000, 2000) ,
        (15, 'Junior', 50000, 1500) ,
        (16, 'Senior', 81000, 1000)]
rDfObj = pd.DataFrame(wage, columns=['Regd','Exp','Wage','Bonus'] , index=['a', 'b', 'c', 'd', 'e', 'f', 'g'])

#Merging both the dataframes
newDF = lDfObj.merge(rDfObj, left_index=True, right_index=True)

#printing the merged dataframe
print("The merged dataframe is-")
print(newDF)

Output :
The merged dataframe is-
     Regd_x    Name     Age       City         Exp_x   Regd_y    Exp_y      Wage     Bonus
a      10         Jill         16.0      Tokyo         10        10          Junior     75000      2000
b      11        Rachel   38.0      Texas           5         11          Senior     72200      1000
c      12         Kirti       39.0      New York    7         12           Expert    90999     1100
d      13        Veena    40.0      Texas           21       13          Expert     90000      1000
e      14        Lucifer    NaN     Texas           30       14          Junior     20000      2000
f      15         Pablo      30.0      New York    7         15          Junior     50000      1500
g      16       Lionel      45.0      Colombia    11       16          Senior     81000      1000

Finally, if we want to merge a dataframe by the index of the first dataframe with some column from the other dataframe, we can also do that.

Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.

Read more Articles on Python Data Analysis Using Padas

Leave a Comment