How to get unique values in columns of a Dataframe in Python ?
Pandas find unique values in column: To find the Unique values in a Dataframe we can use-
- series.unique(self)- Returns a numpy array of Unique values
- series.nunique(self, axis=0, dropna=True )- Returns the count of Unique values along different axis.(If axis = 0 i.e. default value, it checks along the columns.If axis = 1, it checks along the rows)
To test these functions let’s use the following data-
Name Age City Experience a jack 34.0 Sydney 5 b Riti 31.0 Delhi 7 c Aadi 16.0 NaN 11 d Mohit 31.0 Delhi 7 e Veena NaN Delhi 4 f Shaunak 35.0 Mumbai 5 g Shaun 35.0 Colombo 11
Finding unique values in a single column :
Unique python pandas: To get the unique value(here age) we use the unique( )
function on the column
CODE:-
#Program : import numpy as np import pandas as pd # Data list emp = [('jack', 34, 'Sydney', 5) , ('Riti', 31, 'Delhi' , 7) , ('Aadi', 16, np.NaN, 11) , ('Mohit', 31,'Delhi' , 7) , ('Veena', np.NaN, 'Delhi' , 4) , ('Shaunak', 35, 'Mumbai', 5 ), ('Shaun', 35, 'Colombo', 11) ] # Object of Dataframe class created empObj = pd.DataFrame(emp, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c', 'd', 'e', 'f', 'g']) # Obtain the unique values in column 'Age' of the dataframe uValues = empObj['Age'].unique() # empObj[‘Age’] returns a series object of the column ‘Age’ print('The unique values in column "Age" are ') print(uValues)
Output : The unique values in column "Age" are [34. 31. 16. nan 35.]
- Pandas: Replace NaN with mean or average in Dataframe using fillna()
- Python Pandas Series shift() Function
- Pandas: 6 Different ways to iterate over rows in a Dataframe & Update while iterating row by row
Counting unique values in a single column :
Pandas print unique values in column: If we want to calculate the number of Unique values rather than the unique values, we can use the .nunique( )
function.
CODE:-
#Program : import numpy as np import pandas as pd # Data list emp = [('jack', 34, 'Sydney', 5) , ('Riti', 31, 'Delhi' , 7) , ('Aadi', 16, np.NaN, 11) , ('Mohit', 31,'Delhi' , 7) , ('Veena', np.NaN, 'Delhi' , 4) , ('Shaunak', 35, 'Mumbai', 5 ), ('Shaun', 35, 'Colombo', 11) ] # Object of Dataframe class created empObj = pd.DataFrame(emp, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c', 'd', 'e', 'f', 'g']) # Counting the unique values in column 'Age' of the dataframe uValues = empObj['Age'].nunique() print('Number of unique values in 'Age' column :') print(uValues)
Output : Number of unique values in 'Age' column : 4
Including NaN while counting the Unique values in a column :
Get unique values in column pandas: NaN’s are not counted by default in the .nunique( )
function. To also include NaN we have to pass the dropna argument
CODE:-
#Program : import numpy as np import pandas as pd # Data list emp = [('jack', 34, 'Sydney', 5) , ('Riti', 31, 'Delhi' , 7) , ('Aadi', 16, np.NaN, 11) , ('Mohit', 31,'Delhi' , 7) , ('Veena', np.NaN, 'Delhi' , 4) , ('Shaunak', 35, 'Mumbai', 5 ), ('Shaun', 35, 'Colombo', 11) ] # Object of Dataframe class created empObj = pd.DataFrame(emp, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c', 'd', 'e', 'f', 'g']) # Counting the unique values in column 'Age' also including NaN uValues = empObj['Age'].nunique(dropna=False) print('Number of unique values in 'Age' column including NaN:) print(uValues)
Output : Number of unique values in 'Age' column including NaN: 5
Counting unique values in each column of the dataframe :
Python dataframe unique: To count the number of Unique values in each columns
CODE:-
#Program : import numpy as np import pandas as pd # Data list emp = [('jack', 34, 'Sydney', 5) , ('Riti', 31, 'Delhi' , 7) , ('Aadi', 16, np.NaN, 11) , ('Mohit', 31,'Delhi' , 7) , ('Veena', np.NaN, 'Delhi' , 4) , ('Shaunak', 35, 'Mumbai', 5 ), ('Shaun', 35, 'Colombo', 11) ] # Object of Dataframe class created empObj = pd.DataFrame(emp, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c', 'd', 'e', 'f', 'g']) # Counting the unique values in each column uValues = empObj.nunique() print('In each column the number of unique values are') print(uValues)
Output : In each column the number of unique values are Name 7 Age 4 City 4 Experience 4 dtype: int64
To include the NaN, just pass dropna into the function.
Get Unique values in multiple columns :
Pandas count unique values in multiple columns: To get unique values in multiple columns, we have to pass all the contents of columns as a series object into the .unique( )
function
CODE:-
#program : import numpy as np import pandas as pd # Data list emp = [('jack', 34, 'Sydney', 5) , ('Riti', 31, 'Delhi' , 7) , ('Aadi', 16, np.NaN, 11) , ('Mohit', 31,'Delhi' , 7) , ('Veena', np.NaN, 'Delhi' , 4) , ('Shaunak', 35, 'Mumbai', 5 ), ('Shaun', 35, 'Colombo', 11) ] # Object of Dataframe class created empObj = pd.DataFrame(emp, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c', 'd', 'e', 'f', 'g']) # Obtain the Unique values in multiple columns i.e. Name & Age uValues = (empObj['Name'].append(empObj['Age'])).unique() print('The unique values in column "Name" & "Age" :') print(uValues)
Output : The unique values in column "Name" & "Age" : ['jack' 'Riti' 'Aadi' 'Mohit' 'Veena' 'Shaunak' 'Shaun' 34.0 31.0 16.0 nan 35.0]
Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.
Read more Articles on Python Data Analysis Using Padas – Select items from a Dataframe
- Select Rows & Columns in a Dataframe using loc & iloc in
- Select Rows in a Dataframe based on conditions
- Get minimum values in rows or columns & their index position in Dataframe
- Select first or last N rows in a Dataframe using head() & tail()
- Get a list of column and row names in a DataFrame
- Get DataFrame contents as a list of rows or columns (list of lists)