How to merge Dataframes by index using Dataframe.merge() in Python ?
Pandas join on index: In this article we are going to see how we can merge two dataframes by using index of both the dataframes or by suing index of one dataframe and some columns of the other dataframe, and how we can keep a merged dataframe with similar indices. So, let’s start the exploring the topic.
DataFrame.merge()
SYNTAX :
DataFrame.merge(right, how='inner', on=None, left_on=None, right_on=None, index_left=False, index_right=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)
For this short example, we will only focus on some arguments :
- On : This is the column name on which the merge is supposed to be done.
- Left_on : The column names of the left dataframe which are to be merged
- Right_on : The column names of the right dataframe which are to be merged
- index_left : It takes a boolean value whose default values is false. If it is true then it will choose indices from the left dataframe as join key.
- index_right : It takes a boolean value whose default values is false. If it is true then it will choose indices from the right dataframe as join key.
To demonstrate we will be taking the following two dataframes :
Left dataframe :
Regd Name Age City Exp 0 10 Jill 16.0 Tokyo 10 1 11 Rachel 38.0 Texas 5 2 12 Kirti 39.0 New York 7 3 13 Veena 40.0 Texas 21 4 14 Lucifer NaN Texas 30 5 15 Pablo 30.0 New York 7 6 16 Lionel 45.0 Colombia 11
Right dataframe :
Regd Exp Wage Bonus 0 10 Junior 75000 2000 1 11 Senior 72200 1000 2 12 Expert 90999 1100 3 13 Expert 90000 1000 4 14 Junior 20000 2000 5 15 Junior 50000 1500 6 16 Senior 81000 1000
- Pandas : Merge Dataframes on specific columns or on index in Python – Part 2
- Pandas : How to Merge Dataframes using Dataframe.merge() in Python – Part 1
- Python: Count Nan and Missing Values in Dataframe Using Pandas
Merging two Dataframes by index of both the dataframes :
Dataframe merge on index: Here you might have noticed we have a common column named ‘Regd’ . So we can merge both the dataframes by passing left_index and right_index as true in the function.
# Program : # Importing the module import pandas as pd import numpy as np #Left Dataframe students = [(10,'Jill', 16, 'Tokyo', 10), (11,'Rachel', 38, 'Texas', 5), (12,'Kirti', 39, 'New York', 7), (13,'Veena', 40, 'Texas', 21), (14,'Lucifer', np.NaN, 'Texas', 30), (15,'Pablo', 30, 'New York', 7), (16,'Lionel', 45, 'Colombia', 11) ] lDfObj = pd.DataFrame(students, columns=['Regd','Name','Age','City','Exp'],index=['a', 'b', 'c', 'd', 'e', 'f', 'g']) #Right dataframe wage = [(10, 'Junior', 75000, 2000) , (11, 'Senior', 72200, 1000) , (12, 'Expert', 90999, 1100) , (13, 'Expert', 90000, 1000) , (14, 'Junior', 20000, 2000) , (15, 'Junior', 50000, 1500) , (16, 'Senior', 81000, 1000)] rDfObj = pd.DataFrame(wage, columns=['Regd','Exp','Wage','Bonus'] , index=['a', 'b', 'c', 'd', 'e', 'f', 'g']) #Merging both the dataframes newDF = lDfObj.merge(rDfObj, left_index=True, right_index=True) #printing the merged dataframe print("The merged dataframe is-") print(newDF)
Output : The merged dataframe is- Regd_x Name Age City Exp_x Regd_y Exp_y Wage Bonus a 10 Jill 16.0 Tokyo 10 10 Junior 75000 2000 b 11 Rachel 38.0 Texas 5 11 Senior 72200 1000 c 12 Kirti 39.0 New York 7 12 Expert 90999 1100 d 13 Veena 40.0 Texas 21 13 Expert 90000 1000 e 14 Lucifer NaN Texas 30 14 Junior 20000 2000 f 15 Pablo 30.0 New York 7 15 Junior 50000 1500 g 16 Lionel 45.0 Colombia 11 16 Senior 81000 1000
Finally, if we want to merge a dataframe by the index of the first dataframe with some column from the other dataframe, we can also do that.
Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.
Read more Articles on Python Data Analysis Using Padas
- How to merge Dataframes using Dataframe.merge() in Python?
- How to merge Dataframes on specific columns or on index in Python?
- Count NaN or missing values in DataFrame
- Count rows in a dataframe | all or those only that satisfy a condition
- 6 Different ways to iterate over rows in a Dataframe & Update while iterating row by row
- Loop or Iterate over all or certain columns of a DataFrame
- How to display full Dataframe i.e. print all rows & columns without truncation