How to get the sum of column values in a dataframe in Python ?
In this article, we will discuss about how to get the sum To find the sum of values in a dataframe. So, let’s start exploring the topic.
Select the column by name and get the sum of all values in that column :
To find the sum of values of a single column we have to use the sum( )
or the loc[ ]
function.
Using sum() :
Here by using sum( )
only, we selected a column from a dataframe by the column name and from that we can get the sum of values in that column.
Syntax- dataFrame_Object[‘column_name’].sum( )
#Program : import numpy as np import pandas as pd # Example data students = [('Jill', 16, 'Tokyo', 150), ('Rachel', 38, 'Texas', 177), ('Kirti', 39, 'New York', 97), ('Veena', 40, 'Texas', np.NaN), ('Lucifer', np.NaN, 'Texas', 130), ('Pablo', 30, 'New York', 155), ('Lionel', 45, 'Colombia', 121) ] dfObj = pd.DataFrame(students, columns=['Name','Age','City','Score']) #Sum of all values in the 'Score' column of the dataframe totalSum = dfObj['Score'].sum() print(totalSum)
Output : 830.0
- Pandas: Sum rows in Dataframe ( all or certain rows)
- Python: Count Nan and Missing Values in Dataframe Using Pandas
- Pandas: Add Two Columns into a New Column in Dataframe
Using loc[ ] :
Here by using loc[]
and sum( )
only, we selected a column from a dataframe by the column name and from that we can get the sum of values in that column.
Syntax- dataFrame_Object_name.loc[:, ‘column_name’].sum( )
So, let’s see the implementation of it by taking an example.
#Program : import numpy as np import pandas as pd # Example data students = [('Jill', 16, 'Tokyo', 150), ('Rachel', 38, 'Texas', 177), ('Kirti', 39, 'New York', 97), ('Veena', 40, 'Texas', np.NaN), ('Lucifer', np.NaN, 'Texas', 130), ('Pablo', 30, 'New York', 155), ('Lionel', 45, 'Colombia', 121) ] dfObj = pd.DataFrame(students, columns=['Name','Age','City','Score']) #Sum of all values in the 'Score' column of the dataframe using loc[ ] totalSum = dfObj.loc[:, 'Score'].sum() print(totalSum)
Output : 830.0
Select the column by position and get the sum of all values in that column :
In case we don’t know about the column name but we know its position, we can find the sum of all value in that column using both iloc[ ]
and sum( )
. The iloc[ ] returns a series of values which is then passed into the sum( )
function.
So, let’s see the implementation of it by taking an example.
#Program : import numpy as np import pandas as pd # Example data students = [('Jill', 16, 'Tokyo', 150), ('Rachel', 38, 'Texas', 177), ('Kirti', 39, 'New York', 97), ('Veena', 40, 'Texas', np.NaN), ('Lucifer', np.NaN, 'Texas', 130), ('Pablo', 30, 'New York', 155), ('Lionel', 45, 'Colombia', 121) ] dfObj = pd.DataFrame(students, columns=['Name','Age','City','Score']) column_number = 4 # Total sum of values in 4th column i.e. ‘Score’ totalSum = dfObj.iloc[:, column_number-1:column_number].sum() print(totalSum)
Output : Score 830.0 dtype: float64
Find the sum of columns values for selected rows only in Dataframe :
If we need the sum of values from a column’s specific entries we can-
So, let’s see the implementation of it by taking an example.
#Program : import numpy as np import pandas as pd # Example data students = [('Jill', 16, 'Tokyo', 150), ('Rachel', 38, 'Texas', 177), ('Kirti', 39, 'New York', 97), ('Veena', 40, 'Texas', np.NaN), ('Lucifer', np.NaN, 'Texas', 130), ('Pablo', 30, 'New York', 155), ('Lionel', 45, 'Colombia', 121) ] dfObj = pd.DataFrame(students, columns=['Name','Age','City','Score']) column_number = 4 entries = 3 #Sum of the first three values from the 4th column totalSum = dfObj.iloc[0:entries, column_number-1:column_number].sum() print(totalSum)
Output : Score 424.0 dtype: float64
Find the sum of column values in a dataframe based on condition :
In case we want the sum of all values that follows our conditions, for example scores of a particular city like New York can be found out by –
So, let’s see the implementation of it by taking an example.
#Program : import numpy as np import pandas as pd # Example data students = [('Jill', 16, 'Tokyo', 150), ('Rachel', 38, 'Texas', 177), ('Kirti', 39, 'New York', 97), ('Veena', 40, 'Texas', np.NaN), ('Lucifer', np.NaN, 'Texas', 130), ('Pablo', 30, 'New York', 155), ('Lionel', 45, 'Colombia', 121) ] dfObj = pd.DataFrame(students, columns=['Name','Age','City','Score']) #Sum of all the scores from New York city totalSum = dfObj.loc[dfObj['City'] == 'New York', 'Score'].sum() print(totalSum)
Output : 252.0