Python Pandas : Select Rows in DataFrame by conditions on multiple columns

About Pandas DataFrame

It  is a 2 dimensional data structure, like a 2 dimensional array, or a table with rows and columns.

This article is all about showing different ways to select rows in DataFrame based on condition on single or multiple columns.

import pandas as pd
students = [ ('Shyam', 'books' , 24) ,
             ('ankur', 'pencil' , 28) ,
             ('Rekha', 'pen' , 30) ,
             ('Sarika', 'books', 62) ,
             ('Lata', 'file' , 33) ,
             ('Mayank', 'pencil' , 30) ] 
dataframeobj = pd.DataFrame(students, columns = ['Name' , 'Product', 'Sale'])
print(dataframeobj)

Output will be:

RESTART: C:/Users/HP/Desktop/dataframe.py
Name    Product    Sale
0   Shyam   books       24
1   Ankur    pencil       28
2   Rekha    pen          30
3   Sarika    books      62
4   Lata       file           33
5   Mayank  pencil     30

Select Rows based on value in column

Let’s see how to Select rows based on some conditions in  DataFrame.

Select rows in above example for which ‘Product’ column contains the value ‘books’,

import pandas as pd
students = [ ('Shyam', 'books' , 24) ,
             ('ankur', 'pencil' , 28) ,
             ('Rekha', 'pen' , 30) ,
             ('Sarika', 'books', 62) ,
             ('Lata', 'file' , 33) ,
             ('Mayank', 'pencil' , 30) ] 
dataframeobj = pd.DataFrame(students, columns = ['Name' , 'Product', 'Sale'])
subsetDataFrame = dataframeobj[dataframeobj['Product'] == 'books']
print(subsetDataFrame)

Output:

RESTART: C:/Users/HP/Desktop/dataframe.py
Name     Product   Sale
0     Shyam    books      24
3     Sarika     books      62

In above example we have seen that subsetDataFrame = dataframeobj[dataframeobj['Product'] == 'books']

using this it will return column which have ‘Product’ contains ‘Books’ only.

So,if we want to see whole functionality?See below.

When we apply [dataframeobj['Product'] == 'books']this condition,it will give output in true & false form.

0 True
1 False
2 False
3 True
4 False
5 False
Name: Product, dtype: bool

It will give true when the condition matches otherwise false.

If we pass this series object to [] operator of DataFrame, then it will be return a new DataFrame with only those rows that has True in the passed Series object i.e.

RESTART: C:/Users/HP/Desktop/dataframe.py

Name     Product   Sale

0     Shyam    books      24

3     Sarika     books      62

If we select any other product name it will return value accordingly.

Select Rows based on any of the multiple values in column

Select rows from above example for which ‘Product‘ column contains either ‘Pen‘ or ‘Pencil‘ i.e

import pandas as pd
students = [ ('Shyam', 'books' , 24) ,
             ('ankur', 'pencil' , 28) ,
             ('Rekha', 'pen' , 30) ,
             ('Sarika', 'books', 62) ,
             ('Lata', 'file' , 33) ,
             ('Mayank', 'pencil' , 30) ] 
dataframeobj = pd.DataFrame(students, columns = ['Name' , 'Product', 'Sale'])
subsetDataFrame = dataframeobj[dataframeobj['Product'].isin(['pen', 'pencil']) ]
print(subsetDataFrame)

We have given product name list by isin() function and it will return true if condition will match otherwise false.

Therefore, it will return a DataFrame in which Column ‘Product‘ contains either ‘Pen‘ or ‘Pencil‘ only i.e.

Output:

RESTART: C:/Users/HP/Desktop/dataframe.py
Name Product Sale
1 ankur     pencil  28
2 Rekha    pen      30
5 Mayank pencil   30

Select DataFrame Rows Based on multiple conditions on columns

In this method we are going to select rows in above example for which ‘Sale’ column contains value greater than 20 & less than 33.So for this we are going to give some condition.

import pandas as pd
students = [ ('Shyam', 'books' , 24) ,
             ('ankur', 'pencil' , 28) ,
             ('Rekha', 'pen' , 30) ,
             ('Sarika', 'books', 62) ,
             ('Lata', 'file' , 33) ,
             ('Mayank', 'pencil' , 30) ] 
dataframeobj = pd.DataFrame(students, columns = ['Name' , 'Product', 'Sale'])
filterinfDataframe = dataframeobj[(dataframeobj['Sale'] > 20) & (dataframeobj['Sale'] < 33) ]
print(filterinfDataframe)

It will return following DataFrame object in which Sales column  contains value between 20 to 33,

RESTART: C:/Users/HP/Desktop/dataframe.py
    Name      Product Sale
0 Shyam      books    24
1 ankur        pencil    28
2 Rekha       pen       30
5 Mayank    pencil    30

Conclusion:

In this article we have seen diferent methods to select rows in dataframe by giving some condition.Hope you find this informative.

Leave a Comment