Pandas : Convert Dataframe column into an index using set_index() in Python

Converting Dataframe column into an index using set_index() in Python

In this article we will learn to convert an existing column of Dataframe to a index including various cases. We can implement this using set_index() function of Pandas Dataframe class.

DataFrame.set_index() :

Syntax:- DataFrame.set_index(self, keys, drop=True, append=False, inplace=False, verify_integrity=False)

Arguments:

  • Keys: Column names that we want to set as an index of dataframe.
  • drop: (bool), default is True
  1. Where found True, after converting as an index column is deleted
  2. Where found False, then column is not deleted
  • append: (bool), default is False (If passed as True, then adds the given column is added to the existing index, and if passed as False, then current Index is replaced with it.)
  • inplace: (bool), in default is False (If passed as True then makes changes in the calling dataframe object otherwise if it is False, then returns a copy of modified dataframe.)
  • verify_integrity: (bool), default is False
  1. If True, searches for duplicate entries in new index.
  • Dataframe has a default index and we can give a name e.g. SL
import pandas as sc
# List of Tuples
players = [('Smith', 15, 'Pune', 170000),
            ('Rana', 99, 'Mumbai', 118560),
            ('Jaydev', 51, 'Kolkata', 258741),
            ('Shikhar', 31, 'Hyderabad', 485169),
            ('Sanju', 12, 'Rajasthan', 150000),
            ('Raina', 35, 'Gujarat', 250000)
            ]
# Creation of DataFrame object
playDFObj = sc.DataFrame(players, columns=['Name', 'JerseyN', 'Team', 'Salary'])
# Renaming index of dataframe as 'SL'
playDFObj.index.rename('SL', inplace=True)
print('Original Dataframe: ')
print(playDFObj)
Output :
Original Dataframe: 
Name JerseyN Team Salary
SL 
0 Smith 15 Pune 170000
1 Rana 99 Mumbai 118560
2 Jaydev 51 Kolkata 258741
3 Shikhar 31 Hyderabad 485169
4 Sanju 12 Rajasthan 150000
5 Raina 35 Gujarat 250000

Converting a column of Dataframe into an index of the Dataframe :

Let’s try to convert of column Name into index of dataframe. We can implement this by passing that column name into set_index. Here the column names would be converted to ‘Name’ deleting old index.

Here it only it changes is made in the copy of dataframe without modifying original dataframe.

import pandas as sc
# List of Tuples
players = [('Smith', 15, 'Pune', 170000),
            ('Rana', 99, 'Mumbai', 118560),
            ('Jaydev', 51, 'Kolkata', 258741),
            ('Shikhar', 31, 'Hyderabad', 485169),
            ('Sanju', 12, 'Rajasthan', 150000),
            ('Raina', 35, 'Gujarat', 250000)
            ]
# Creation DataFrame object
playDFObj = sc.DataFrame(players, columns=['Name', 'JerseyN', 'Team', 'Salary'])
# Renaming index of dataframe as 'SL'
playDFObj.index.rename('SL', inplace=True)
print('Original Dataframe: ')
print(playDFObj)
# set column 'Name' as the index of the Dataframe
modifplayDF = playDFObj.set_index('Name')
print('Modified Dataframe of players:')
print(modifplayDF)
Output :
Original Dataframe: 
Name JerseyN Team Salary
SL 
0 Smith   15  Pune            170000
1 Rana     99  Mumbai      118560
2 Jaydev  51  Kolkata        258741
3 Shikhar 31  Hyderabad 485169
4 Sanju    12  Rajasthan   150000
5 Raina    35  Gujarat       250000

Modified Dataframe of players:
JerseyN Team Salary
Name 
Smith    15 Pune           170000
Rana     99 Mumbai      118560
Jaydev  51 Kolkata        258741
Shikhar 31 Hyderabad  485169
Sanju    12 Rajasthan    150000
Raina    35 Gujarat        250000

Converting a column of Dataframe into index without deleting the column :

In this case we will try to keep the column name and also index as ‘Name’ by passing drop argument as false.

import pandas as sc
# List of Tuples
players = [('Smith', 15, 'Pune', 170000),
            ('Rana', 99, 'Mumbai', 118560),
            ('Jaydev', 51, 'Kolkata', 258741),
            ('Shikhar', 31, 'Hyderabad', 485169),
            ('Sanju', 12, 'Rajasthan', 150000),
            ('Raina', 35, 'Gujarat', 250000)
            ]
# Creation of DataFrame object
playDFObj = sc.DataFrame(players, columns=['Name', 'JerseyN', 'Team', 'Salary'])
playDFObj.index.rename('ID', inplace=True)
# keep column name and index as 'Name'
modifplayDF = playDFObj.set_index('Name', drop=False)
print('Modified Dataframe of players:')
print(modifplayDF)
Output :
Modified Dataframe of players:
Name  JerseyN       Team  Salary
Name                                       
Smith      Smith       15       Pune  170000
Rana        Rana       99     Mumbai  118560
Jaydev    Jaydev       51    Kolkata  258741
Shikhar  Shikhar       31  Hyderabad  485169
Sanju      Sanju       12  Rajasthan  150000
Raina      Raina       35    Gujarat  250000

Appending a Dataframe column of into index to make it Multi-Index Dataframe :

In above cases the index ‘SL’ is replaced. If we want to keep it we have to pass append argument as True.

import pandas as sc
# List of Tuples
players = [('Smith', 15, 'Pune', 170000),
            ('Rana', 99, 'Mumbai', 118560),
            ('Jaydev', 51, 'Kolkata', 258741),
            ('Shikhar', 31, 'Hyderabad', 485169),
            ('Sanju', 12, 'Rajasthan', 150000),
            ('Raina', 35, 'Gujarat', 250000)
            ]
# Creation DataFrame object
playDFObj = sc.DataFrame(players, columns=['Name', 'JerseyN', 'Team', 'Salary'])
playDFObj.index.rename('SL', inplace=True)
# Making a mulit-index dataframe
modifplayDF = playDFObj.set_index('Name', append=True)
print('Modified Dataframe of players:')
print(modifplayDF)
Output :
Modified Dataframe of players:
JerseyN       Team  Salary
SL Name                              
0  Smith         15       Pune  170000
1  Rana          99     Mumbai  118560
2  Jaydev        51    Kolkata  258741
3  Shikhar       31  Hyderabad  485169
4  Sanju         12  Rajasthan  150000
5  Raina         35    Gujarat  250000

Checking for duplicates in the new index :

If we wanted to check index doesn’t contain any duplicate values after converting a column to the index by passing verify_integrity as True in set_index(). If any duplicate value found error will be raised.

import pandas as sc
# List of Tuples
players = [('Smith', 15, 'Pune', 170000),
            ('Rana', 99, 'Mumbai', 118560),
            ('Jaydev', 51, 'Kolkata', 258741),
            ('Shikhar', 31, 'Mumbai', 485169),
            ('Sanju', 12, 'Rajasthan', 150000),
            ('Raina', 35, 'Gujarat', 250000)
            ]
# Creation of DataFrame object
playDFObj = sc.DataFrame(players, columns=['Name', 'JerseyN', 'Team', 'Salary'])
# Rename index of dataframe as 'SL'
playDFObj.index.rename('SL', inplace=True)
modifplayDF = playDFObj.set_index('Team', verify_integrity=True)
print(modifplayDF)
Output :
ValueError: Index has duplicate keys

Modifying existing Dataframe by converting into index :

 We can also make changes in existing dataframe. We can implement this by assign two methods-

  1. Assign the returned dataframe object to original dataframe variable where the variable would point to updated dataframe.
  2. Passing argument inplace as True.
import pandas as sc
# List of Tuples
players = [('Smith', 15, 'Pune', 170000),
            ('Rana', 99, 'Mumbai', 118560),
            ('Jaydev', 51, 'Kolkata', 258741),
            ('Shikhar', 31, 'Hyderabad', 485169),
            ('Sanju', 12, 'Rajasthan', 150000),
            ('Raina', 35, 'Gujarat', 250000)
            ]
# Creation DataFrame object
playDFObj = sc.DataFrame(players, columns=['Name', 'JerseyN', 'Team', 'Salary'])
playDFObj.index.rename('SL', inplace=True)
playDFObj.set_index('Name', inplace=True)
print('Contenets of original dataframe :')
print(playDFObj)
Output :
Contenets of original dataframe :
JerseyN Team Salary
Name 
Smith 15 Pune 170000
Rana 99 Mumbai 118560
Jaydev 51 Kolkata 258741
Shikhar 31 Hyderabad 485169
Sanju 12 Rajasthan 150000
Raina 35 Gujarat 250000

Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.

Read more Articles on Python Data Analysis Using Padas – Modify a Dataframe