Changeing data type of single or multiple columns of Dataframe in Python
In this article we will see how we can change the data type of a single or multiple column of Dataframe in Python.
Change Data Type of a Single Column :
We will use series.astype()
to change the data type of columns
Syntax:- Series.astype(self, dtype, copy=True, errors='raise', **kwargs)
where Arguments:
- dtype : It is python type to which whole series object will get converted.
- errors : It is a way of handling errors, which can be ignore/ raise and default value is ‘raised’. (raise- Raise exception in case of invalid parsing , ignore- Return the input as original in case of invalid parsing
- copy : bool (Default value is True) (If False- Will make change in current object , If True- Return a copy)
Returns: If copy argument is true, new Series object with updated type is returned.
import pandas as sc # List of Tuples students = [('Rohit', 34, 'Swimming', 155) , ('Ritik', 25, 'Cricket' , 179) , ('Salim', 26, 'Music', 187) , ('Rani', 29,'Sleeping' , 154) , ('Sonu', 17, 'Singing' , 184) , ('Madhu', 20, 'Travelling', 165 ), ('Devi', 22, 'Art', 141) ] # Create a DataFrame object with different data type of column studObj = sc.DataFrame(students, columns=['Name', 'Age', 'Hobby', 'Height']) print(studObj) print(studObj.dtypes)
Output : Name Age Hobby Height 0 Rohit 34 Swimming 155 1 Ritik 25 Cricket 179 2 Salim 26 Music 187 3 Rani 29 Sleeping 154 4 Sonu 17 Singing 184 5 Madhu 20 Travelling 165 6 Devi 22 Art 141 Name object Age int64 Hobby object Height int64 dtype: object
- How to convert Dataframe column type from string to date time
- Pandas : Convert Dataframe column into an index using set_index() in Python
- Append/Add Row to Dataframe in Pandas – dataframe.append() | How to Insert Rows to Pandas Dataframe?
Change data type of a column from int64 to float64 :
We can change data type of a column a column e.g. Let’s try changing data type of ‘Age’ column from int64 to float64. For this we have to write Float64 in astype()
which will get reflected in dataframe.
import pandas as sc # List of Tuples students = [('Rohit', 34, 'Swimming', 155) , ('Ritik', 25, 'Cricket' , 179) , ('Salim', 26, 'Music', 187) , ('Rani', 29,'Sleeping' , 154) , ('Sonu', 17, 'Singing' , 184) , ('Madhu', 20, 'Travelling', 165 ), ('Devi', 22, 'Art', 141) ] # Create a DataFrame object with different datatype of column studObj = sc.DataFrame(students, columns=['Name', 'Age', 'Hobby', 'Height']) # Change data type of column 'Age' to float64 studObj['Age'] = studObj['Age'].astype('float64') print(studObj) print(studObj.dtypes)
Output : Name Age Hobby Height 0 Rohit 34.0 Swimming 155 1 Ritik 25.0 Cricket 179 2 Salim 26.0 Music 187 3 Rani 29.0 Sleeping 154 4 Sonu 17.0 Singing 184 5 Madhu 20.0 Travelling 165 6 Devi 22.0 Art 141 Name object Age float64 Hobby object Height int64 dtype: object
Change data type of a column from int64 to string :
Let’s try to change the data type of ‘Height’ column to string i.e. Object type. As we know by default value of astype() was True, so it returns a copy of passed series with changed Data type which will be assigned to studObj['Height'].
import pandas as sc # List of Tuples students = [('Rohit', 34, 'Swimming', 155) , ('Ritik', 25, 'Cricket' , 179) , ('Salim', 26, 'Music', 187) , ('Rani', 29,'Sleeping' , 154) , ('Sonu', 17, 'Singing' , 184) , ('Madhu', 20, 'Travelling', 165 ), ('Devi', 22, 'Art', 141) ] studObj = sc.DataFrame(students, columns=['Name', 'Age', 'Hobby', 'Height']) # Change data type of column 'Marks' from int64 to float64 studObj['Age'] = studObj['Age'].astype('float64') # Change data type of column 'Marks' from int64 to Object type or string studObj['Height'] = studObj['Height'].astype('object') print(studObj) print(studObj.dtypes)
Output : Name Age Hobby Height 0 Rohit 34.0 Swimming 155 1 Ritik 25.0 Cricket 179 2 Salim 26.0 Music 187 3 Rani 29.0 Sleeping 154 4 Sonu 17.0 Singing 184 5 Madhu 20.0 Travelling 165 6 Devi 22.0 Art 141 Name object Age float64 Hobby object Height object dtype: object
Change Data Type of Multiple Columns in Dataframe :
To change the datatype of multiple column in Dataframe we will use DataFeame.astype()
which can be applied for whole dataframe or selected columns.
Synatx:- DataFrame.astype(self, dtype, copy=True, errors='raise', **kwargs)
Arguments:
- dtype : It is python type to which whole series object will get converted. (Dictionary of column names and data types where given colum will be converted to corrresponding types.)
- errors : It is a way of handling errors, which can be ignore/ raise and default value is ‘raised’.
- raise : Raise exception in case of invalid parsing
- ignore : Return the input as original in case of invalid parsing
- copy : bool (Default value is True) (If False- Will make change in current object , If True- Return a copy)
Returns: If copy argument is true, new Series object with updated type is returned.
Change Data Type of two Columns at same time :
Let’s try to convert columns ‘Age’ & ‘Height of int64 data type to float64 & string respectively. We will pass a Dictionary to Dataframe.astype()
where it contain column name as keys and new data type as values.
import pandas as sc # List of Tuples students = [('Rohit', 34, 'Swimming', 155) , ('Ritik', 25, 'Cricket' , 179) , ('Salim', 26, 'Music', 187) , ('Rani', 29,'Sleeping' , 154) , ('Sonu', 17, 'Singing' , 184) , ('Madhu', 20, 'Travelling', 165 ), ('Devi', 22, 'Art', 141) ] # Create a DataFrame object with different datatype of column studObj = sc.DataFrame(students, columns=['Name', 'Age', 'Hobby', 'Height']) # Convert the data type of column Age to float64 & column Marks to string studObj = studObj.astype({'Age': 'float64', 'Height': 'object'}) print(studObj) print(studObj.dtypes)
Output : Name Age Hobby Height 0 Rohit 34.0 Swimming 155 1 Ritik 25.0 Cricket 179 2 Salim 26.0 Music 187 3 Rani 29.0 Sleeping 154 4 Sonu 17.0 Singing 184 5 Madhu 20.0 Travelling 165 6 Devi 22.0 Art 141 Name object Age float64 Hobby object Height object dtype: object
Handle errors while converting Data Types of Columns :
Using astype()
to convert either a column or multiple column we can’t pass the content which can’t be typecasted. Otherwise error will be produced.
import pandas as sc # List of Tuples students = [('Rohit', 34, 'Swimming', 155) , ('Ritik', 25, 'Cricket' , 179) , ('Salim', 26, 'Music', 187) , ('Rani', 29,'Sleeping' , 154) , ('Sonu', 17, 'Singing' , 184) , ('Madhu', 20, 'Travelling', 165 ), ('Devi', 22, 'Art', 141) ] # Create a DataFrame object with different datatype of column studObj = sc.DataFrame(students, columns=['Name', 'Age', 'Hobby', 'Height']) # Trying to change dataype of a column with unknown dataype try: studObj['Name'] = studObj['Name'].astype('xyz') except TypeError as ex: print(ex)
Output : data type "xyz" not understood
Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.
Read more Articles on Python Data Analysis Using Padas – Modify a Dataframe
- pandas.apply(): Apply a function to each row/column in Dataframe
- Pandas: Sort rows or columns in Dataframe based on values using Dataframe.sort_values()
- Apply a function to single or selected columns or rows in Dataframe
- Sort a DataFrame based on column names or row index labels using Dataframe.sort_index() in Pandas
- Change Column & Row names in DataFrame
- Convert Dataframe column type from string to date time
- Convert Dataframe column into to the Index of Dataframe
- Convert Dataframe indexes into columns