Converting Dataframe column into an index using set_index() in Python
In this article we will learn to convert an existing column of Dataframe to a index including various cases. We can implement this using set_index()
function of Pandas Dataframe class.
- Convert a column of Dataframe into an index of the Dataframe
- Convert a column of Dataframe into index without deleting the column
- Append a Dataframe column of into index to make it Multi-Index Dataframe
- Check for duplicates in the new index
- Modify existing Dataframe by converting into index
DataFrame.set_index() :
Syntax:- DataFrame.set_index(self, keys, drop=True, append=False, inplace=False, verify_integrity=False)
Arguments:
- Keys: Column names that we want to set as an index of dataframe.
- drop: (bool), default is True
- Where found True, after converting as an index column is deleted
- Where found False, then column is not deleted
- append: (bool), default is False (If passed as True, then adds the given column is added to the existing index, and if passed as False, then current Index is replaced with it.)
- inplace: (bool), in default is False (If passed as True then makes changes in the calling dataframe object otherwise if it is False, then returns a copy of modified dataframe.)
- verify_integrity: (bool), default is False
- If True, searches for duplicate entries in new index.
- Dataframe has a default index and we can give a name e.g. SL
import pandas as sc # List of Tuples players = [('Smith', 15, 'Pune', 170000), ('Rana', 99, 'Mumbai', 118560), ('Jaydev', 51, 'Kolkata', 258741), ('Shikhar', 31, 'Hyderabad', 485169), ('Sanju', 12, 'Rajasthan', 150000), ('Raina', 35, 'Gujarat', 250000) ] # Creation of DataFrame object playDFObj = sc.DataFrame(players, columns=['Name', 'JerseyN', 'Team', 'Salary']) # Renaming index of dataframe as 'SL' playDFObj.index.rename('SL', inplace=True) print('Original Dataframe: ') print(playDFObj)
Output : Original Dataframe: Name JerseyN Team Salary SL 0 Smith 15 Pune 170000 1 Rana 99 Mumbai 118560 2 Jaydev 51 Kolkata 258741 3 Shikhar 31 Hyderabad 485169 4 Sanju 12 Rajasthan 150000 5 Raina 35 Gujarat 250000
- Python Pandas DataFrame eq() Function
- Python Pandas DataFrame ne() Function
- Python Pandas DataFrame ge() Function
Converting a column of Dataframe into an index of the Dataframe :
Let’s try to convert of column Name into index of dataframe. We can implement this by passing that column name into set_index
. Here the column names would be converted to ‘Name’ deleting old index.
Here it only it changes is made in the copy of dataframe without modifying original dataframe.
import pandas as sc # List of Tuples players = [('Smith', 15, 'Pune', 170000), ('Rana', 99, 'Mumbai', 118560), ('Jaydev', 51, 'Kolkata', 258741), ('Shikhar', 31, 'Hyderabad', 485169), ('Sanju', 12, 'Rajasthan', 150000), ('Raina', 35, 'Gujarat', 250000) ] # Creation DataFrame object playDFObj = sc.DataFrame(players, columns=['Name', 'JerseyN', 'Team', 'Salary']) # Renaming index of dataframe as 'SL' playDFObj.index.rename('SL', inplace=True) print('Original Dataframe: ') print(playDFObj) # set column 'Name' as the index of the Dataframe modifplayDF = playDFObj.set_index('Name') print('Modified Dataframe of players:') print(modifplayDF)
Output : Original Dataframe: Name JerseyN Team Salary SL 0 Smith 15 Pune 170000 1 Rana 99 Mumbai 118560 2 Jaydev 51 Kolkata 258741 3 Shikhar 31 Hyderabad 485169 4 Sanju 12 Rajasthan 150000 5 Raina 35 Gujarat 250000 Modified Dataframe of players: JerseyN Team Salary Name Smith 15 Pune 170000 Rana 99 Mumbai 118560 Jaydev 51 Kolkata 258741 Shikhar 31 Hyderabad 485169 Sanju 12 Rajasthan 150000 Raina 35 Gujarat 250000
Converting a column of Dataframe into index without deleting the column :
In this case we will try to keep the column name and also index as ‘Name’ by passing drop argument as false.
import pandas as sc # List of Tuples players = [('Smith', 15, 'Pune', 170000), ('Rana', 99, 'Mumbai', 118560), ('Jaydev', 51, 'Kolkata', 258741), ('Shikhar', 31, 'Hyderabad', 485169), ('Sanju', 12, 'Rajasthan', 150000), ('Raina', 35, 'Gujarat', 250000) ] # Creation of DataFrame object playDFObj = sc.DataFrame(players, columns=['Name', 'JerseyN', 'Team', 'Salary']) playDFObj.index.rename('ID', inplace=True) # keep column name and index as 'Name' modifplayDF = playDFObj.set_index('Name', drop=False) print('Modified Dataframe of players:') print(modifplayDF)
Output : Modified Dataframe of players: Name JerseyN Team Salary Name Smith Smith 15 Pune 170000 Rana Rana 99 Mumbai 118560 Jaydev Jaydev 51 Kolkata 258741 Shikhar Shikhar 31 Hyderabad 485169 Sanju Sanju 12 Rajasthan 150000 Raina Raina 35 Gujarat 250000
Appending a Dataframe column of into index to make it Multi-Index Dataframe :
In above cases the index ‘SL’ is replaced. If we want to keep it we have to pass append argument as True.
import pandas as sc # List of Tuples players = [('Smith', 15, 'Pune', 170000), ('Rana', 99, 'Mumbai', 118560), ('Jaydev', 51, 'Kolkata', 258741), ('Shikhar', 31, 'Hyderabad', 485169), ('Sanju', 12, 'Rajasthan', 150000), ('Raina', 35, 'Gujarat', 250000) ] # Creation DataFrame object playDFObj = sc.DataFrame(players, columns=['Name', 'JerseyN', 'Team', 'Salary']) playDFObj.index.rename('SL', inplace=True) # Making a mulit-index dataframe modifplayDF = playDFObj.set_index('Name', append=True) print('Modified Dataframe of players:') print(modifplayDF)
Output : Modified Dataframe of players: JerseyN Team Salary SL Name 0 Smith 15 Pune 170000 1 Rana 99 Mumbai 118560 2 Jaydev 51 Kolkata 258741 3 Shikhar 31 Hyderabad 485169 4 Sanju 12 Rajasthan 150000 5 Raina 35 Gujarat 250000
Checking for duplicates in the new index :
If we wanted to check index doesn’t contain any duplicate values after converting a column to the index by passing verify_integrity
as True in set_index(
). If any duplicate value found error will be raised.
import pandas as sc # List of Tuples players = [('Smith', 15, 'Pune', 170000), ('Rana', 99, 'Mumbai', 118560), ('Jaydev', 51, 'Kolkata', 258741), ('Shikhar', 31, 'Mumbai', 485169), ('Sanju', 12, 'Rajasthan', 150000), ('Raina', 35, 'Gujarat', 250000) ] # Creation of DataFrame object playDFObj = sc.DataFrame(players, columns=['Name', 'JerseyN', 'Team', 'Salary']) # Rename index of dataframe as 'SL' playDFObj.index.rename('SL', inplace=True) modifplayDF = playDFObj.set_index('Team', verify_integrity=True) print(modifplayDF)
Output : ValueError: Index has duplicate keys
Modifying existing Dataframe by converting into index :
We can also make changes in existing dataframe. We can implement this by assign two methods-
- Assign the returned dataframe object to original dataframe variable where the variable would point to updated dataframe.
- Passing argument
inplace
as True.
import pandas as sc # List of Tuples players = [('Smith', 15, 'Pune', 170000), ('Rana', 99, 'Mumbai', 118560), ('Jaydev', 51, 'Kolkata', 258741), ('Shikhar', 31, 'Hyderabad', 485169), ('Sanju', 12, 'Rajasthan', 150000), ('Raina', 35, 'Gujarat', 250000) ] # Creation DataFrame object playDFObj = sc.DataFrame(players, columns=['Name', 'JerseyN', 'Team', 'Salary']) playDFObj.index.rename('SL', inplace=True) playDFObj.set_index('Name', inplace=True) print('Contenets of original dataframe :') print(playDFObj)
Output : Contenets of original dataframe : JerseyN Team Salary Name Smith 15 Pune 170000 Rana 99 Mumbai 118560 Jaydev 51 Kolkata 258741 Shikhar 31 Hyderabad 485169 Sanju 12 Rajasthan 150000 Raina 35 Gujarat 250000
Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.
Read more Articles on Python Data Analysis Using Padas – Modify a Dataframe
- pandas.apply(): Apply a function to each row/column in Dataframe
- Pandas: Sort rows or columns in Dataframe based on values using Dataframe.sort_values()
- Apply a function to single or selected columns or rows in Dataframe
- Sort a DataFrame based on column names or row index labels using Dataframe.sort_index() in Pandas
- Change data type of single or multiple columns of Dataframe in Python
- Change Column & Row names in DataFrame
- Convert Dataframe column type from string to date time
- Convert Dataframe indexes into columns