What is a Structured Numpy Array and how to create and sort it in Python?

Structured Numpy Array and how to create and sort it in Python

In this article we will learn what is structured numpy array, how to create it and how to sort with different functions.

What is a Structured Numpy Array ?

A Structured Numpy array is an array of structures where we can also make of homogeneous structures too.

Creating a Structured Numpy Array

To create structured numpy array we will pass list of tuples with elements in dtype parameter and we will create numpy array based on this stype.

import numpy as sc

# Creation of structure

dtype = [('Name', (sc.str_, 10)), ('CGPA', sc.float64), ('Age', sc.int32)]

# Creation a Strucured Numpy array

structured_arr = sc.array([('Ben',8.8 , 18), ('Rani', 9.4, 15), ('Tanmay', 9.8, 17), ('Saswat', 7.6, 16)], dtype=dtype)

print(structured_arr.dtype)
Output :
[('Name', '<U10'), ('CGPA', '<f8'), ('Age', '<i4')]

Sort the Structured Numpy array by field ‘Name’ of the structure

How to Sort a Structured Numpy Array ?

We can sort a big structured numpy array by providing a parameter ‘order’ parameter provided by numpy.sort() and numpy.ndarray.sort(). Let’s sort the structured numpy array on the basis of field ‘Name‘.

import numpy as sc

# Creation of structure

dtype = [('Name', (sc.str_, 10)), ('CGPA', sc.float64), ('Age', sc.int32)]

# Creation a Strucured Numpy array

structured_arr = sc.array([('Ben',8.8 , 18), ('Rani', 9.4, 15), ('Tanmay', 9.8, 17), ('Saswat', 7.6, 16)], dtype=dtype)

sor_arr = sc.sort(structured_arr, order='Name')

print('Sorted Array on the basis on name : ')

print(sor_arr)
Output :
Sorted Array on the basis on name :
[('Ben', 8.8, 18) ('Rani', 9.4, 15) ('Saswat', 7.6, 16)
('Tanmay', 9.8, 17)]

Sort the Structured Numpy array by field ‘Age’ of the structure

We can also sort the structured numpy array on the basis of field ‘Marks‘.

import numpy as sc

# Creation of structure

dtype = [('Name', (sc.str_, 10)), ('CGPA', sc.float64), ('Age', sc.int32)]

# Creation a Strucured Numpy array

structured_arr = sc.array([('Ben',8.8 , 18), ('Rani', 9.4, 15), ('Tanmay', 9.8, 17), ('Saswat', 7.6, 16)], dtype=dtype)

sor_arr = sc.sort(structured_arr, order='Age')

print('Sorted Array on the basis on Age : ')

print(sor_arr)
Output :
Sorted Array on the basis on Age :
[('Rani', 9.4, 15) ('Saswat', 7.6, 16) ('Tanmay', 9.8, 17)
('Ben', 8.8, 18)]

Sort the Structured Numpy array by ‘Name’ & ‘Age’ fields of the structure :

We can also sort Structured Numpy array based on multiple fields ‘Name‘ & ‘Age‘.

import numpy as sc

# Creation of structure

dtype = [('Name', (sc.str_, 10)), ('CGPA', sc.float64), ('Age', sc.int32)]

# Creation a Strucured Numpy array

structured_arr = sc.array([('Ben',8.8 , 18), ('Rani', 9.4, 15), ('Tanmay', 9.8, 17), ('Saswat', 7.6, 16)], dtype=dtype)

sor_arr = sc.sort(structured_arr, order=['Name','Age'])

print('Sorted Array on the basis on name & age : ')

print(sor_arr)
Output :
Sorted Array on the basis on name & age:
[('Ben', 8.8, 18) ('Rani', 9.4, 15) ('Saswat', 7.6, 16)
('Tanmay', 9.8, 17)]

Pandas : How to Merge Dataframes using Dataframe.merge() in Python – Part 1

Merging Dataframes using Dataframe.merge() in Python

In this article, we will learn to merge two different DataFrames into a single one using function Dataframe.merge().

Dataframe.merge() :

Dataframe class of Python’s Pandas library provide a function i.e. merge() which helps in merging of two DataFrames.

Syntax:- DataFrame.merge(right, how='inner', on=None, leftOn=None, rightOn=None, left_index=False, right_index=False, sort=False, suffix=('_x', '_y'), copy=True, indicate=False, validate=None)

Arguments:-

  • right : A dataframe that is to be merged with the calling dataframe.
  • how : (Merge type). Some values are : left, right, outer, inner. It’s default value is ‘inner’. If the two dataframes contains different columns, then based how value, columns will be considered accordingly for merging.
  • on : It is the column name on which merge will be done. If not provided then merged done on basis of indexes.
  • left_on : Column in left dataframe where merging is to be done.
  • right_on : Column in right datframe, where merging is to be done.
  • left_index : (bool), default is False (If found True index index from left dataframe selected as join key)
  • right_index : (bool), default is False (If found True index index from right dataframe selecte as join key)
  • suffixes : tuple of (str, str), default (‘_x’, ‘_y’)
  • Suffix that is to be applied on overlapping columns in left and right dataframes respectively.

Let’s see one by one

Merge DataFrames on common columns (Default Inner Join) :

If we have two DataFrames of two common columns, by directly calling merge()  function the two columns will be merged considering common columns as join keys and the dissimilar columns would just be copied from one dataframe to another dataframe.

Let’s see the below program to understand it clearly.

import pandas as sc
# List of Tuples
players = [(15,'Smith','Pune', 17,12000),
            (99,'Rana', 'Mumbai', 20,2000),
            (51,'Jaydev','Kolkata', 22,25640),
            (31,'Shikhar','Hyderabad', 28,85410),
            (12,'Sanju','Rajasthan', 21,63180),
            (35,'Raina','Gujarat', 18,62790)
            ]
# Creation of DataFrame object
playDFObj = sc.DataFrame(players, columns=['JersyN','Name', 'Team', 'Age','Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI'])
print('DataFrame 1 : ')
print(playDFObj)
moreInfo = [(15, 13, 180000, 12000) ,
           (99, 2, 195200, 2000) ,
           (51, 7, 15499, 25640) ,
           (31, 17, 654000, 85410) ,
           (12, 5, 201000, 63180) ,
           (35, 14, 741000, 62790)
            ]
# Creation of DataFrame object
moreinfoObj = sc.DataFrame(moreInfo, columns=['JersyN', 'PLayingSince' , 'Salary', 'Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI'])
print('DataFrame 2 : ')
print(moreinfoObj)
# Merge two Dataframes on basis of common column by default INNER JOIN
mergedDataf = playDFObj.merge(moreinfoObj)
print(mergedDataf)
Output :
DataFrame 1 :
     JersyN     Name       Team  Age  Sponsered
I        15    Smith       Pune   17      12000
II       99     Rana     Mumbai   20       2000
III      51   Jaydev    Kolkata   22      25640
IV       31  Shikhar  Hyderabad   28      85410
V        12    Sanju  Rajasthan   21      63180
VI       35    Raina    Gujarat   18      62790
DataFrame 2 :
     JersyN  PLayingSince  Salary  Sponsered
I        15            13  180000      12000
II       99             2  195200       2000
III      51             7   15499      25640
IV       31            17  654000      85410
V        12             5  201000      63180
VI       35            14  741000      62790
   JersyN     Name       Team  Age  Sponsered  PLayingSince  Salary
0      15    Smith       Pune   17      12000            13  180000
1      99     Rana     Mumbai   20       2000             2  195200
2      51   Jaydev    Kolkata   22      25640             7   15499
3      31  Shikhar  Hyderabad   28      85410            17  654000
4      12    Sanju  Rajasthan   21      63180             5  201000
5      35    Raina    Gujarat   18      62790            14  741000

What is Inner Join ?

In above case, inner join occured for key columns i.e. ‘JersyN’ & ‘Sponsered’. During inner join the common columns of two dataframes are picked and merged. We can also explicitly do inner join by passing how argument with values as inner. After implementing both the cases will have same result.

Merge Dataframes using Left Join :

What is left join ?

While merging columns we can include all rows from left DataFrame and NaN from which values are missing in right DataFrame.

Let’s see the below program to understand it clearly.

import pandas as sc
# List of Tuples
players = [(15,'Smith','Pune', 17,12000),
            (99,'Rana', 'Mumbai', 20,2000),
            (51,'Jaydev','Kolkata', 22,25640),
            (31,'Shikhar','Hyderabad', 28,85410),
            (12,'Sanju','Rajasthan', 21,63180),
            (35,'Raina','Gujarat', 18,62790)
            ]
# Creation of DataFrame object
playDFObj = sc.DataFrame(players, columns=['JersyN','Name', 'Team', 'Age','Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI'])
print('DataFrame 1 : ')
print(playDFObj)
moreInfo = [(15, 13, 180000, 12000) ,
           (99, 2, 2000) ,
           (51, 7, 15499, 25640) ,
           (31, 17, 654000) ,
           (12, 5, 201000, 63180) ,
           (35, 14, 741000, 62790)
            ]
# Creation of DataFrame object
moreinfoObj = sc.DataFrame(moreInfo, columns=['JersyN', 'PLayingSince' , 'Salary', 'Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI'])
print('DataFrame 2 : ')
print(moreinfoObj)
# Merge two Dataframes on basis of common column by default INNER JOIN
mergedDataf = playDFObj.merge(moreinfoObj, how='left')
print('After merging: ')
print(mergedDataf)
Output :
DataFrame 1 :
     JersyN     Name       Team  Age  Sponsered
I        15    Smith       Pune   17      12000
II       99     Rana     Mumbai   20       2000
III      51   Jaydev    Kolkata   22      25640
IV       31  Shikhar  Hyderabad   28      85410
V        12    Sanju  Rajasthan   21      63180
VI       35    Raina    Gujarat   18      62790
DataFrame 2 :
     JersyN  PLayingSince  Salary  Sponsered
I        15            13  180000    12000.0
II       99             2    2000        NaN
III      51             7   15499    25640.0
IV       31            17  654000        NaN
V        12             5  201000    63180.0
VI       35            14  741000    62790.0
After merging:
   JersyN     Name       Team  Age Sponsered  PLayingSince    Salary
0      15    Smith       Pune   17     12000          13.0  180000.0
1      99     Rana     Mumbai   20      2000           NaN       NaN
2      51   Jaydev    Kolkata   22     25640           7.0   15499.0
3      31  Shikhar  Hyderabad   28     85410           NaN       NaN
4      12    Sanju  Rajasthan   21     63180           5.0  201000.0
5      35    Raina    Gujarat   18     62790          14.0  741000.0

Merge DataFrames using Right Join :

What is Right join ?

While merging columns we can include all rows from right DataFrame and NaN from which values are missing in left DataFrame.

Let’s see the below program to understand it clearly.

import pandas as sc
# List of Tuples
players = [(15,'Smith','Pune', 17,12000),
            (99,'Rana', 'Mumbai', 20,2000),
            (51,'Jaydev','Kolkata', 22,25640),
            (31,'Shikhar','Hyderabad', 28,85410),
            (12,'Sanju','Rajasthan', 21,63180),
            (35,'Raina','Gujarat', 18,62790)
            ]
# Creation of DataFrame object
playDFObj = sc.DataFrame(players, columns=['JersyN','Name', 'Team', 'Age','Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI'])
print('DataFrame 1 : ')
print(playDFObj)
moreInfo = [(15, 13, 180000, 12000) ,
           (99, 2, 2000) ,
           (51, 7, 15499, 25640) ,
           (31, 17, 654000) ,
           (12, 5, 201000, 63180) ,
           (35, 14, 741000, 62790)
            ]
# Creation of DataFrame object
moreinfoObj = sc.DataFrame(moreInfo, columns=['JersyN', 'PLayingSince' , 'Salary', 'Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI'])
print('DataFrame 2 : ')
print(moreinfoObj)
# Merge two Dataframes on basis of common column by default INNER JOIN
mergedDataf = playDFObj.merge(moreinfoObj, how='right')
print('After merging: ')
print(mergedDataf)
Output :
DataFrame 1 :
     JersyN     Name       Team  Age  Sponsered
I        15    Smith       Pune   17      12000
II       99     Rana     Mumbai   20       2000
III      51   Jaydev    Kolkata   22      25640
IV       31  Shikhar  Hyderabad   28      85410
V        12    Sanju  Rajasthan   21      63180
VI       35    Raina    Gujarat   18      62790
DataFrame 2 :
     JersyN  PLayingSince  Salary  Sponsered
I        15            13  180000    12000.0
II       99             2    2000        NaN
III      51             7   15499    25640.0
IV       31            17  654000        NaN
V        12             5  201000    63180.0
VI       35            14  741000    62790.0
After merging:
   JersyN    Name       Team   Age  Sponsered  PLayingSince  Salary
0      15   Smith       Pune  17.0    12000.0            13  180000
1      51  Jaydev    Kolkata  22.0    25640.0             7   15499
2      12   Sanju  Rajasthan  21.0    63180.0             5  201000
3      35   Raina    Gujarat  18.0    62790.0            14  741000
4      99     NaN        NaN   NaN        NaN             2    2000
5      31     NaN        NaN   NaN        NaN            17  654000

Merge DataFrames using Outer Join :

What is Outer join ?

While merging columns of two dataframes, we can even include all rows of two DataFrames and add NaN for the values missing in left or right DataFrame.

Let’s see the below program to understand it clearly.

import pandas as sc
# List of Tuples
players = [(15,'Smith','Pune', 17,12000),
            (99,'Rana', 'Mumbai', 20,2000),
            (51,'Jaydev','Kolkata', 22,25640),
            (31,'Shikhar','Hyderabad', 28,85410),
            (12,'Sanju','Rajasthan', 21,63180),
            (35,'Raina','Gujarat', 18,62790)
            ]
# Creation of DataFrame object
playDFObj = sc.DataFrame(players, columns=['JersyN','Name', 'Team', 'Age','Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI'])
print('DataFrame 1 : ')
print(playDFObj)
moreInfo = [(15, 13, 180000, 12000) ,
           (99, 2, 2000) ,
           (51, 7, 15499, 25640) ,
           (31, 17, 654000) ,
           (12, 5, 201000, 63180) ,
           (35, 14, 741000, 62790)
            ]
# Creation of DataFrame object
moreinfoObj = sc.DataFrame(moreInfo, columns=['JersyN', 'PLayingSince' , 'Salary', 'Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI'])
print('DataFrame 2 : ')
print(moreinfoObj)
# Merge two Dataframes on basis of common column by default INNER JOIN
mergedDataf = playDFObj.merge(moreinfoObj, how='outer')
print('After merging: ')
print(mergedDataf)
Output :
DataFrame 1 :
     JersyN     Name       Team  Age  Sponsered
I        15    Smith       Pune   17      12000
II       99     Rana     Mumbai   20       2000
III      51   Jaydev    Kolkata   22      25640
IV       31  Shikhar  Hyderabad   28      85410
V        12    Sanju  Rajasthan   21      63180
VI       35    Raina    Gujarat   18      62790
DataFrame 2 :
     JersyN  PLayingSince  Salary  Sponsered
I        15            13  180000    12000.0
II       99             2    2000        NaN
III      51             7   15499    25640.0
IV       31            17  654000        NaN
V        12             5  201000    63180.0
VI       35            14  741000    62790.0
After merging:
   JersyN     Name       Team   Age  Sponsered  PLayingSince    Salary
0      15    Smith       Pune  17.0    12000.0          13.0  180000.0
1      99     Rana     Mumbai  20.0     2000.0           NaN       NaN
2      51   Jaydev    Kolkata  22.0    25640.0           7.0   15499.0
3      31  Shikhar  Hyderabad  28.0    85410.0           NaN       NaN
4      12    Sanju  Rajasthan  21.0    63180.0           5.0  201000.0
5      35    Raina    Gujarat  18.0    62790.0          14.0  741000.0
6      99      NaN        NaN   NaN        NaN           2.0    2000.0
7      31      NaN        NaN   NaN        NaN          17.0  654000.0

Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.

Read more Articles on Python Data Analysis Using Padas

Pandas : Merge Dataframes on specific columns or on index in Python – Part 2

Merge Dataframes on specific columns or on index in Python

In this article, we will learn to merge dataframes on basis of given columns or index.

Dataframe.merge() :

Dataframe class of Python’s Pandas library provide a function i.e. merge() which helps in merging of two DataFrames.

Syntax: DataFrame.merge(right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)

Arguments:-

  • right : A dataframe that is to be merged with the calling dataframe.
  • how : (Merge type). Some values are : left, right, outer, inner. Its default value is ‘inner’. If the two dataframes contains different columns, then based how value, columns will be considered accordingly for merging.
  • on : It is the column name on which merge will be done. If not provided then merged done on basis of indexes.
  • left_on : Column in left dataframe where merging is to be done.
  • right_on : Column in right dataframe, where merging is to be done.
  • left_index : (bool), default is False (If found True index index from left dataframe selected as join key)
  • right_index : (bool), default is False (If found True index index from right dataframe selected as join key)
  • suffixes : tuple of (str, str), default (‘_x’, ‘_y’) (Suffix that is to be applied on overlapping columns in left and right dataframes respectively.)

Merging Dataframe on a given column name as join key :

Let’s take a scenario where the columns names are same, but contents are different i.e. one column data is of int type and other column data is of string type. And if we apply merge() on them without passing any argument, it wouldn’t merge here. Here, we can merge dataframes on a single column by passing on argument in merge() function.

And as both dataframes have common column i.e. sponsered, so after merging columns are named by default. It will splitted by taking a suffix  i.e. Sponsered_x and Sponsered_y as left and right dataframe respectively.

import pandas as sc
# List of Tuples
players = [(15,'Smith','Pune', 17,12000),
            (99,'Rana', 'Mumbai', 20,2000),
            (51,'Jaydev','Kolkata', 22,25640),
            (31,'Shikhar','Hyderabad', 28,85410),
            (12,'Sanju','Rajasthan', 21,63180),
            (35,'Raina','Gujarat', 18,62790)
            ]
# Creation of DataFrame object
playDFObj = sc.DataFrame(players, columns=['JersyN','Name', 'Team', 'Age','Salary'], index=['I', 'II', 'III', 'IV', 'V', 'VI'])
print('DataFrame 1 : ')
print(playDFObj)
# List of Tuples
moreInfo = [(15, 13, 180000, 'Nissin') ,
           (99, 2, 195200, 'Jio') ,
           (51, 7, 15499, 'Lays') ,
           (31, 17, 654000, 'AmbujaC') ,
           (12, 5, 201000, 'AsianP') ,
           (35, 14, 741000, 'Airtel')
            ]
# Creation of DataFrame object
moreinfoObj = sc.DataFrame(moreInfo, columns=['JersyN', 'PLayingSince' , 'Salary', 'Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI'])
print('DataFrame 2 : ')
print(moreinfoObj)
# Merge two Dataframes on basis of common column by default INNER JOIN
mergedDataf = playDFObj.merge(moreinfoObj, on='JersyN')
print('After merging: ')
print(mergedDataf)
Output :

DataFrame 1 : 
 JersyN   Name    Team       Age    Salary
I    15    Smith      Pune         17    12000
II   99    Rana       Mumbai    20    2000
III  51    Jaydev    Kolkata      22   25640
IV  31   Shikhar   Hyderabad 28   85410
V  12    Sanju      Rajasthan   21   63180
VI  35    Raina     Gujarat       18   62790
DataFrame 2 : 
   JersyN PLayingSince   Salary       Sponsered
I    15             13            180000       Nissin
II    99            2              195200         Jio
III   51            7              15499          Lays
IV   31          17              654000     AmbujaC
V   12           5                201000       AsianP
VI   35          14              741000         Airtel
After merging: 
  JersyN     Name    Team            Age Salary_x    PLayingSince Salary_y Sponsered
0  15          Smith    Pune            17       12000       13              180000   Nissin
1  99          Rana     Mumbai       20       2000          2               195200   Jio
2  51          Jaydev   Kolkata        22       25640        7               15499    Lays
3  31          Shikhar  Hyderabad   28      85410       17              654000  AmbujaC
4  12          Sanju     Rajasthan     21      63180        5               201000  AsianP
5  5           Raina    Gujarat           18      62790       14              741000  Airtel

Merging Dataframe on a given column with suffix for similar column names :

In previous example, for common columns with dissimilar contents suffix x & y are added. We can also add our own custom suffix.

import pandas as sc
# List of Tuples
players = [(15,'Smith','Pune', 17,12000),
            (99,'Rana', 'Mumbai', 20,2000),
            (51,'Jaydev','Kolkata', 22,25640),
            (31,'Shikhar','Hyderabad', 28,85410),
            (12,'Sanju','Rajasthan', 21,63180),
            (35,'Raina','Gujarat', 18,62790)
            ]
# Creation of DataFrame object
playDFObj = sc.DataFrame(players, columns=['JersyN','Name', 'Team', 'Age','Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI'])
print('DataFrame 1 : ')
print(playDFObj)
# List of Tuples
moreInfo = [(15, 13, 180000, 'Nissin') ,
           (99, 2, 195200, 'Jio') ,
           (51, 7, 15499, 'Lays') ,
           (31, 17, 654000, 'AmbujaC') ,
           (12, 5, 201000, 'AsianP') ,
           (35, 14, 741000, 'Airtel')
            ]
# Creation of DataFrame object
moreinfoObj = sc.DataFrame(moreInfo, columns=['JersyN', 'PLayingSince' , 'Salary', 'Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI'])
print('DataFrame 2 : ')
print(moreinfoObj)
# Merge two Dataframes on basis of common column by default INNER JOIN
mergedDataf = playDFObj.merge(moreinfoObj, on='JersyN',suffixes=('_Price', '_Companies'))
print('After merging: ')
print(mergedDataf)
Output :
DataFrame 1 : 
  JersyN Name  Team          Age Sponsered
I 15       Smith   Pune           17   12000
II 99      Rana     Mumbai     20   2000
III 51     Jaydev   Kolkata      22   25640
IV 31    Shikhar  Hyderabad 28   85410
V 12     Sanju     Rajasthan   21   63180
VI 35    Raina     Gujarat      18    62790
DataFrame 2 : 
  JersyN PLayingSince Salary     Sponsered
I   15      13                 180000   Nissin
II  99     2                    195200    Jio
III  51     7                   15499      Lays
IV  31   17                   654000   AmbujaC
V  12     5                    201000    AsianP
VI  35   14                   741000     Airtel
After merging: 
JersyN Name Team ... PLayingSince Salary Sponsered_Companies
0 15 Smith Pune ... 13 180000 Nissin
1 99 Rana Mumbai ... 2 195200 Jio
2 51 Jaydev Kolkata ... 7 15499 Lays
3 31 Shikhar Hyderabad ... 17 654000 AmbujaC
4 12 Sanju Rajasthan ... 5 201000 AsianP
5 35 Raina Gujarat ... 14 741000 Airtel

Merging Dataframe different columns :

Now let’s take a scenario of changing name of JersyN column of a dataframe and try to merge it with another dataframe.

import pandas as sc
# List of Tuples
players = [(15,'Smith','Pune', 17,12000),
            (99,'Rana', 'Mumbai', 20,2000),
            (51,'Jaydev','Kolkata', 22,25640),
            (31,'Shikhar','Hyderabad', 28,85410),
            (12,'Sanju','Rajasthan', 21,63180),
            (35,'Raina','Gujarat', 18,62790)
            ]
# Creation of DataFrame object
playDFObj = sc.DataFrame(players, columns=['JersyN','Name', 'Team', 'Age','Salary'], index=['I', 'II', 'III', 'IV', 'V', 'VI'])
print('DataFrame 1 : ')
print(playDFObj)
# List of Tuples
moreInfo = [(15, 13, 180000, 'Nissin') ,
           (99, 2, 195200, 'Jio') ,
           (51, 7, 15499, 'Lays') ,
           (31, 17, 654000, 'AmbujaC') ,
           (12, 5, 201000, 'AsianP') ,
           (35, 14, 741000, 'Airtel')
            ]
# Creation of DataFrame object
moreinfoObj = sc.DataFrame(moreInfo, columns=['JersyN', 'PLayingSince' , 'Salary', 'Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI'])
print('DataFrame 2 : ')
print(moreinfoObj)
# Rename column JersyN to ShirtN
moreinfoObj.rename(columns={'JersyN': 'ShirtN'}, inplace=True)
# Merge two Dataframes on basis of common column by default INNER JOIN
mergedDataf = playDFObj.merge(moreinfoObj, left_on='JersyN', right_on='ShirtN')
print('After merging: ')
print(mergedDataf)
Output ;
DataFrame 1 : 
 JersyN Name Team         Age   Salary
I 15    Smith     Pune           17   12000
II 99   Rana      Mumbai      20    2000
III 51  Jaydev   Kolkata        22    25640
IV 31  Shikhar  Hyderabad  28    85410
V 12   Sanju     Rajasthan     21   63180
VI 35  Raina     Gujarat        18    62790
DataFrame 2 : 
   JersyN  PLayingSince   Salary    Sponsered
I   15              13           180000    Nissin
II   99             2             195200    Jio
III  51             7             15499     Lays
IV  31           17             654000   AmbujaC
V  12             5              201000   AsianP
VI  35           14             741000   Airtel
After merging: 
JersyN Name Team        Age ... ShirtN PLayingSince Salary Sponsered_y
0 15 Smith   Pune            17 ...   15           13             180000   Nissin
1 99 Rana     Mumbai      20 ...   99            2              195200   Jio
2 51 Jaydev  Kolkata       22 ...    51            7              15499    Lays
3 31 Shikhar Hyderabad 28 ...    31            17            654000  AmbujaC
4 12 Sanju    Rajasthan    21 ...   12             5             201000  AsianP
5 35 Raina    Gujarat       18 ...    35            14            741000  Airtel

[6 rows x 9 columns]

Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.

Read more Articles on Python Data Analysis Using Padas

Pandas : Convert Dataframe column into an index using set_index() in Python

Converting Dataframe column into an index using set_index() in Python

In this article we will learn to convert an existing column of Dataframe to a index including various cases. We can implement this using set_index() function of Pandas Dataframe class.

DataFrame.set_index() :

Syntax:- DataFrame.set_index(self, keys, drop=True, append=False, inplace=False, verify_integrity=False)

Arguments:

  • Keys: Column names that we want to set as an index of dataframe.
  • drop: (bool), default is True
  1. Where found True, after converting as an index column is deleted
  2. Where found False, then column is not deleted
  • append: (bool), default is False (If passed as True, then adds the given column is added to the existing index, and if passed as False, then current Index is replaced with it.)
  • inplace: (bool), in default is False (If passed as True then makes changes in the calling dataframe object otherwise if it is False, then returns a copy of modified dataframe.)
  • verify_integrity: (bool), default is False
  1. If True, searches for duplicate entries in new index.
  • Dataframe has a default index and we can give a name e.g. SL
import pandas as sc
# List of Tuples
players = [('Smith', 15, 'Pune', 170000),
            ('Rana', 99, 'Mumbai', 118560),
            ('Jaydev', 51, 'Kolkata', 258741),
            ('Shikhar', 31, 'Hyderabad', 485169),
            ('Sanju', 12, 'Rajasthan', 150000),
            ('Raina', 35, 'Gujarat', 250000)
            ]
# Creation of DataFrame object
playDFObj = sc.DataFrame(players, columns=['Name', 'JerseyN', 'Team', 'Salary'])
# Renaming index of dataframe as 'SL'
playDFObj.index.rename('SL', inplace=True)
print('Original Dataframe: ')
print(playDFObj)
Output :
Original Dataframe: 
Name JerseyN Team Salary
SL 
0 Smith 15 Pune 170000
1 Rana 99 Mumbai 118560
2 Jaydev 51 Kolkata 258741
3 Shikhar 31 Hyderabad 485169
4 Sanju 12 Rajasthan 150000
5 Raina 35 Gujarat 250000

Converting a column of Dataframe into an index of the Dataframe :

Let’s try to convert of column Name into index of dataframe. We can implement this by passing that column name into set_index. Here the column names would be converted to ‘Name’ deleting old index.

Here it only it changes is made in the copy of dataframe without modifying original dataframe.

import pandas as sc
# List of Tuples
players = [('Smith', 15, 'Pune', 170000),
            ('Rana', 99, 'Mumbai', 118560),
            ('Jaydev', 51, 'Kolkata', 258741),
            ('Shikhar', 31, 'Hyderabad', 485169),
            ('Sanju', 12, 'Rajasthan', 150000),
            ('Raina', 35, 'Gujarat', 250000)
            ]
# Creation DataFrame object
playDFObj = sc.DataFrame(players, columns=['Name', 'JerseyN', 'Team', 'Salary'])
# Renaming index of dataframe as 'SL'
playDFObj.index.rename('SL', inplace=True)
print('Original Dataframe: ')
print(playDFObj)
# set column 'Name' as the index of the Dataframe
modifplayDF = playDFObj.set_index('Name')
print('Modified Dataframe of players:')
print(modifplayDF)
Output :
Original Dataframe: 
Name JerseyN Team Salary
SL 
0 Smith   15  Pune            170000
1 Rana     99  Mumbai      118560
2 Jaydev  51  Kolkata        258741
3 Shikhar 31  Hyderabad 485169
4 Sanju    12  Rajasthan   150000
5 Raina    35  Gujarat       250000

Modified Dataframe of players:
JerseyN Team Salary
Name 
Smith    15 Pune           170000
Rana     99 Mumbai      118560
Jaydev  51 Kolkata        258741
Shikhar 31 Hyderabad  485169
Sanju    12 Rajasthan    150000
Raina    35 Gujarat        250000

Converting a column of Dataframe into index without deleting the column :

In this case we will try to keep the column name and also index as ‘Name’ by passing drop argument as false.

import pandas as sc
# List of Tuples
players = [('Smith', 15, 'Pune', 170000),
            ('Rana', 99, 'Mumbai', 118560),
            ('Jaydev', 51, 'Kolkata', 258741),
            ('Shikhar', 31, 'Hyderabad', 485169),
            ('Sanju', 12, 'Rajasthan', 150000),
            ('Raina', 35, 'Gujarat', 250000)
            ]
# Creation of DataFrame object
playDFObj = sc.DataFrame(players, columns=['Name', 'JerseyN', 'Team', 'Salary'])
playDFObj.index.rename('ID', inplace=True)
# keep column name and index as 'Name'
modifplayDF = playDFObj.set_index('Name', drop=False)
print('Modified Dataframe of players:')
print(modifplayDF)
Output :
Modified Dataframe of players:
Name  JerseyN       Team  Salary
Name                                       
Smith      Smith       15       Pune  170000
Rana        Rana       99     Mumbai  118560
Jaydev    Jaydev       51    Kolkata  258741
Shikhar  Shikhar       31  Hyderabad  485169
Sanju      Sanju       12  Rajasthan  150000
Raina      Raina       35    Gujarat  250000

Appending a Dataframe column of into index to make it Multi-Index Dataframe :

In above cases the index ‘SL’ is replaced. If we want to keep it we have to pass append argument as True.

import pandas as sc
# List of Tuples
players = [('Smith', 15, 'Pune', 170000),
            ('Rana', 99, 'Mumbai', 118560),
            ('Jaydev', 51, 'Kolkata', 258741),
            ('Shikhar', 31, 'Hyderabad', 485169),
            ('Sanju', 12, 'Rajasthan', 150000),
            ('Raina', 35, 'Gujarat', 250000)
            ]
# Creation DataFrame object
playDFObj = sc.DataFrame(players, columns=['Name', 'JerseyN', 'Team', 'Salary'])
playDFObj.index.rename('SL', inplace=True)
# Making a mulit-index dataframe
modifplayDF = playDFObj.set_index('Name', append=True)
print('Modified Dataframe of players:')
print(modifplayDF)
Output :
Modified Dataframe of players:
JerseyN       Team  Salary
SL Name                              
0  Smith         15       Pune  170000
1  Rana          99     Mumbai  118560
2  Jaydev        51    Kolkata  258741
3  Shikhar       31  Hyderabad  485169
4  Sanju         12  Rajasthan  150000
5  Raina         35    Gujarat  250000

Checking for duplicates in the new index :

If we wanted to check index doesn’t contain any duplicate values after converting a column to the index by passing verify_integrity as True in set_index(). If any duplicate value found error will be raised.

import pandas as sc
# List of Tuples
players = [('Smith', 15, 'Pune', 170000),
            ('Rana', 99, 'Mumbai', 118560),
            ('Jaydev', 51, 'Kolkata', 258741),
            ('Shikhar', 31, 'Mumbai', 485169),
            ('Sanju', 12, 'Rajasthan', 150000),
            ('Raina', 35, 'Gujarat', 250000)
            ]
# Creation of DataFrame object
playDFObj = sc.DataFrame(players, columns=['Name', 'JerseyN', 'Team', 'Salary'])
# Rename index of dataframe as 'SL'
playDFObj.index.rename('SL', inplace=True)
modifplayDF = playDFObj.set_index('Team', verify_integrity=True)
print(modifplayDF)
Output :
ValueError: Index has duplicate keys

Modifying existing Dataframe by converting into index :

 We can also make changes in existing dataframe. We can implement this by assign two methods-

  1. Assign the returned dataframe object to original dataframe variable where the variable would point to updated dataframe.
  2. Passing argument inplace as True.
import pandas as sc
# List of Tuples
players = [('Smith', 15, 'Pune', 170000),
            ('Rana', 99, 'Mumbai', 118560),
            ('Jaydev', 51, 'Kolkata', 258741),
            ('Shikhar', 31, 'Hyderabad', 485169),
            ('Sanju', 12, 'Rajasthan', 150000),
            ('Raina', 35, 'Gujarat', 250000)
            ]
# Creation DataFrame object
playDFObj = sc.DataFrame(players, columns=['Name', 'JerseyN', 'Team', 'Salary'])
playDFObj.index.rename('SL', inplace=True)
playDFObj.set_index('Name', inplace=True)
print('Contenets of original dataframe :')
print(playDFObj)
Output :
Contenets of original dataframe :
JerseyN Team Salary
Name 
Smith 15 Pune 170000
Rana 99 Mumbai 118560
Jaydev 51 Kolkata 258741
Shikhar 31 Hyderabad 485169
Sanju 12 Rajasthan 150000
Raina 35 Gujarat 250000

Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.

Read more Articles on Python Data Analysis Using Padas – Modify a Dataframe

What’s placement new operator and why do we need it ?

Placement new operator and it’s need

This article is all about placement new operator.

Placement new operator :

placement new operator we use to pass a memory address to new as a parameter. This memory is used by the placement new operator to create the object and also to call the constructor on it and then returning the same passed address.

Need of placement new operator :

As we know when we create any object using new operator, then the memory is allocated on the heap.

int * ptr = new int;

But, while working sometimes we need to create an object dynamically for which some specific memory location will be allocated.

For example, we do not want new memory to be allocated on heap rather it needs to be allocated on a given memory address. Actually this scenario comes when we work on any embedded product or with shared memory. So, for this requirement we use placement new operator.

Below is an example code to achieve this :

// Program

#include <iostream>
#include <cstdlib>
#include <new>

int main()
{
// Here memory will not be allocated on heap.
int * space = new int[1004];
// It will use passed spacer to allocate the memory
int * ptr = new(space) int;
*ptr = 7;
std::cout<<(*ptr)<<std::endl;
delete [] buffer;
return 0;
}