Best 5 stock markets APIs in 2020

There are various stock markets that are available online but among all of them, it’s hard to figure out from which site you should visit or which site will be useful.

In this article, we will be discussing the 5 best stock market APIs.

What is Stock market data API?

Real-time or historical data on financial assets that are currently being traded in the markets are offered by stock market data APIs.

Prices of public stocks, ETFs, and ETNs are specially offered by them.

Data:

In the article, we will be more inclined towards the price information. We will be talking about the following APIs and how they are useful:

  1. Yahoo Finance
  2. Google Finance in Google sheets.
  3. IEX cloud
  4. AlphaVantage
  5. World trading data
  6. Other APIs( Polygon.io, intrinio, Quandl)

1. Yahoo Finance:

The API was shut down in 2017. However, it got back up after 2019. The amazing thing is we can still use Yahoo Finance to get free stock data. It is employed by both individual and enterprise-level users.

It is free and reliable and provides access to more than 5 years of daily OHLC price data.

yFinance is the new python module that wraps the new yahoo finance API.

>pip install yfinance

 The GitHub link is provided for the code but I will be attaching the code below for your reference.

GoogleFinance:

Google Finance got shut down in 2012 but some features were still on the go. There is a feature in this API that supports you to get the stock market data and It is known as GoogleFinance in google sheets.

All we have to do is type the below command and we will get the data.

 

GOOGLEFINANCE("GOOG", "price")

Furthermore, the syntax is:

GOOGLEFINANCE(ticker, [attribute], [start_date], [end_date|num_days], [interval])

The ticker is used for security consideration.

Attribute(should be “price” by default).

Start_date: when you want to fetch the historical data.

End_date: Till when you want the data.

Intervals: return data frequency which is either “DAILY” or “WEEKLY”.

2. IEX Cloud:

IEX Cloud is a new financial service just released this year. It’s an independent business separate from IEX Group’s flagship stock exchange, is a high-performance, financial data platform that connects developers and financial data creators.

It is very cheap compared to others and you can get all the data you want easily. It also provides free trial.

You can easily check it out at :

 

Iexfinance

3. AlphaVantage:

You can refer to the website:

https://www.alphavantage.co/

It is the best and the leading provider of various free APIs. It provides gain to access the data related to the stock, FX-data, and cryptocurrency.

AlphaVantage provides access to 5-API request per minute and 500-API requests per day.

4. World Trading Data:

You can refer to the website for World Trading data:

https://www.worldtradingdata.com/

In this trading application, you can access the full intraday API and currency API. The availability ranges from $8 to $32 per month.

There are different types of plans available. You can get 5-stocks per request for free access. You can get 250 total requests per day.

The response will be in JSON format and there will be no python module to wrap their APIs.

5. Other APIs:

Website: https://polygon.io

Polygon.io
Polygon.io

It is only for the US stock market and is available at $199 per month. This is not a good choice for beginners.

Website: https://intrino.com/

intrino
intrino

It is only available for real-time stock data at $75 per month. For EOD price data it is $40 but you can get free access to this on different platforms. So, I guess it might not be a good choice for independent traders.

Website: https://www.quandl.com/

Quandl
Quandl

It is a marketplace for financial, economic, and other related APIs. It aggregates API from thor party so that users can purchase whatever APIs they want to use.

Every other API will have different prices and some APIs will be free and others will be charged.

Quandl contains its analysis tool inside the website which will be more convenient.

It is a platform which will be most suitable if you can spend a lot of money.

Wrapping up:

I hope you find this tutorial useful and will refer to the websites given for stock market data.

Trading is a quite complex field and learning it is not so easy. You have to spend some time and practice understanding the stock market data and its uses.

How to web scrape with Python in 4 minutes

Web Scraping:

Web scraping is used to extract the data from the website and it can save time as well as effort. In this article, we will be extracting hundreds of file from the New York MTA. Some people find web scraping tough, but it is not the case as this article will break the steps into easier ones to get you comfortable with web scraping.

New York MTA Data:

We will download the data from the below website:

http://web.mta.info/developers/turnstile.html

Turnstile data is compiled every week from May 2010 till now, so there are many files that exist on this site. For instance, below is an example of what data looks like.

You can right-click on the link and can save it to your desktop. That is web scraping!

Important Notes about Web scraping:

  1. Read through the website’s Terms and Conditions to understand how you can legally use the data. Most sites prohibit you from using the data for commercial purposes.
  2. Make sure you are not downloading data at too rapid a rate because this may break the website. You may potentially be blocked from the site as well.

Inspecting the website:

The first thing that we should find out is the information contained in the HTML tag from where we want to scrape it. As we know, there is a lot of code on the entire page and it contains multiple HTML tags, so we have to find out the one which we want to scrape and write it down in our code so that all the data related to it will be visible.

When you are on the website, right-click and then when you will scroll down you will get an option of “inspect”. Click on it and see the hidden code behind the page.

You can see the arrow symbol at the top of the console. 

If you will click on the arrow and then click any text or item on the website then the highlighted tag will appear related to the website on which you clicked.

I clicked on Saturday, September 2018 file and the console came in the blue highlighted part.

<a href=”data/nyct/turnstile/turnstile_180922.txt”>Saturday, September 22, 2018</a>

You will see that all the .txt files come in <a> tags. <a> tags are used for hyperlinks.

Now that we got the location, we will process the coding!

Python Code:

The first and foremost step is importing the libraries:

import requests

import urllib.request

import time

from bs4 import BeautifulSoup

Now we have to set the url and access the website:

url = '

http://web.mta.info/developers/turnstile.html’

response = requests.get(url)

Now, we can use the features of beautiful soup for scraping.

soup = BeautifulSoup(response.text, “html.parser”)

We will use the method findAll to get all the <a> tags.

soup.findAll('a')

This function will give us all the <a> tags.

Now, we will extract the actual link that we want.

one_a_tag = soup.findAll(‘a’)[38]

link = one_a_tag[‘href’]

This code will save the first .txt file to our variable link.

download_url = 'http://web.mta.info/developers/'+ link

urllib.request.urlretrieve(download_url,'./'+link[link.find('/turnstile_')+1:])

For pausing our code we will use the sleep function.

time.sleep(1)

To download the entire data we have to apply them for a loop. I am attaching the entire code so that you won’t face any problem.

I hope you understood the concept of web scraping.

Enjoy reading and have fun while scraping!

An Intro to Web Scraping with lxml and Python:

Sometimes we want that data from the API which cannot be accessed using it. Then, in the absence of API, the only choice left is to make a web scraper. The task of the scraper is to scrape all the information which we want in easily and in very little time.

The example of a typical API response in JSON. This is the response from Reddit.

 There are various kinds of python libraries that help in web scraping namely scrapy, lxml, and beautiful soup.

Many articles explain how to use beautiful soup and scrapy but I will be focusing on lxml. I will teach you how to use XPaths and how to use them to extract data from HTML documents.

Getting the data:

If you are into gaming, then you must be familiar with this website steam.

We will be extracting the data from the “popular new release” information.

Now, right-click on the website and you will see the inspect option. Click on it and select the HTML tag.

We want an anchor tag because every list is encapsulated in the <a> tag.

The anchor tag lies in the div tag with an id of tag_newreleasecontent. We are mentioning the id because there are two tabs on this page and we only want the information of popular release data.

Now, create your python file and start coding. You can name the file according to your preference. Start importing the below libraries:

import requests 

import lxml.html

If you don’t have requests to install then type the below code on your terminal:

$ pip install requests

Requests module helps us open the webpage in python.

Extracting and processing the information:

Now, let’s open the web page using the requests and pass that response to lxml.html.fromstring.

html = requests.get('https://store.steampowered.com/explore/new/') 

doc = lxml.html.fromstring(html.content)

This provides us with a structured way to extract information from an HTML document. Now we will be writing an XPath for extracting the div which contains the” popular release’ tab.

new_releases = doc.xpath('//div[@id="tab_newreleases_content"]')[0]

We are taking only one element ([0]) and that would be our required div. Let us break down the path and understand it.

  • // these tell lxml that we want to search for all tags in the HTML document which match our requirements.
  • Div tells lxml that we want to find div tags.
  • @id=”tab_newreleases_content tells the div tag that we are only interested in the id which contains tab_newrelease_content.

Awesome! Now we understand what it means so let’s go back to inspect and check under which tag the title lies.

The title name lies in the div tag inside the class tag_item_name. Now we will run the XPath queries to get the title name.

titles = new_releases.xpath('.//div[@class="tab_item_name"]/text()')







We can see that the names of the popular releases came. Now, we will extract the price by writing the following code:

prices = new_releases.xpath('.//div[@class="discount_final_price"]/text()')

Now, we can see that the prices are also scraped. We will extract the tags by writing the following command:

tags = new_releases.xpath('.//div[@class="tab_item_top_tags"]')

total_tags = []

for tag in tags:

total_tags.append(tag.text_content())

We are extracting the div containing the tags for the game. Then we loop over the list of extracted tags using the tag.text_content method.

Now, the only thing remaining is to extract the platforms associated with each title. Here is the the HTML markup:

The major difference here is that platforms are not contained as texts within a specific tag. They are listed as class name so some titles only have one platform associated with them:

 

<span class="platform_img win">&lt;/span>

 

While others have 5 platforms like this:

 

<span class="platform_img win"></span><span class="platform_img mac"></span><span class="platform_img linux"></span><span class="platform_img hmd_separator"></span> <span title="HTC Vive" class="platform_img htcvive"></span> <span title="Oculus Rift" class="platform_img oculusrift"></span>

The span tag contains platform types as the class name. The only thing common between them is they all contain platform_img class.

First of all, we have to extract the div tags containing the tab_item_details class. Then we will extract the span containing the platform_img class. Lastly, we will extract the second class name from those spans. Refer to the below code:

platforms_div = new_releases.xpath('.//div[@class="tab_item_details"]')

total_platforms = []

for game in platforms_div:    

temp = game.xpath('.//span[contains(@class, "platform_img")]')    

platforms = [t.get('class').split(' ')[-1] for t in temp]    

if 'hmd_separator' in platforms:        

platforms.remove('hmd_separator')   

 total_platforms.append(platforms)

Now we just need this to return a JSON response so that we can easily turn this into Flask based API.

output = []for info in zip(titles,prices, tags, total_platforms):    resp = {}    

resp['title'] = info[0]

resp['price'] = info[1]    

resp['tags'] = info[2]    

resp['platforms'] = info[3]    

output.append(resp)

We are using the zip function to loop over all of the lists in parallel. Then we create a dictionary for each game to assign the game name, price, and platforms as keys in the dictionary.

Wrapping up:

I hope this article is understandable and you find the coding easy.

Enjoy reading!

 

Simple Whatsapp Automation Using Python3 and Selenium

In this article, we will be using python and selenium to automate some messages on WhatsApp.

I hope the reader is well aware of python beforehand.

The first and the foremost step is to install python3 which you can download from https://www.python.org/  and follow up the install instruction. After the installation will be complete, install selenium for the automation of all the tasks we want to perform.

python3 -m pip install Selenium

Selenium Hello World:

After installing selenium, to check whether it is installed correctly or not, run the python code mentioned below and check if there are any errors.

from selenium import webdriver

import time

driver = webdriver.Chrome()

driver.get("http://google.com")

time.sleep(2)

driver.quit()

Save this code in a python file and name it according to your preference. If the program runs correctly without showing any errors, then the Google Chrome window will be opened automatically.

Automate Whatsapp:

Import the modules selenium and time like below.

from selenium import webdriver

import time

After the importing of the modules, the below code will open the WhatsApp web interface which will automatically ask you to scan the QR code and will be logged into your account.

driver = webdriver.Chrome()

driver.get("https://web.whatsapp.com")

print("Scan QR Code, And then Enter")

time.sleep(5)

The next step is entering the username to whom you want to send the message. In my case, I made a group named “WhatsApp bot” and then located an XPath using the inspect method and put it in.

As soon as the WhatsApp bot will be opened, it will automatically locate the WhatsApp bot and will enter that window.

user_name = 'Whatsapp Bot'

user = driver.find_element_by_xpath('//span[@title="{}"]'.format(user_name))

user.click()

After this, the message box will be opened and now you have to inspect the message box and enter the message you want to send. Later, you have to inspect the send button and click on it using the click() method. 

message_box = driver.find_element_by_xpath(‘//div[@class=”_2A8P4″]’)


message_box.send_keys(‘Hey, I am your whatsapp bot’)


message_box = driver.find_element_by_xpath(‘//button[@class=”_1E0Oz”]’)


message_box.click()

As soon as you execute this code, the message will be sent and your work is done.

I am attaching the whole code for your reference.

from selenium import webdriver

import time


driver = webdriver.Chrome(executable_path=””)

time.sleep(5)

user_name = 'Whatsapp Bot'

user = driver.find_element_by_xpath('//span[@title="{}"]'.format(user_name))

user.click()



message_box = driver.find_element_by_xpath('//div[@class="_2A8P4"]')

message_box.send_keys('Hey, I am your whatsapp bot')

message_box = driver.find_element_by_xpath('//button[@class="_1E0Oz"]')

message_box.click()

driver.quit()

At the end we put driver.quit() method to end the execution of the task.

You did a great job making this bot!!

 

Python: Read a CSV file line by line with or without header

In this article, we will be learning about how to read a CSV file line by line with or without a header. Along with that, we will be learning how to select a specified column while iterating over a file.

Let us take an example where we have a file named students.csv.

Id,Name,Course,City,Session
21,Mark,Python,London,Morning
22,John,Python,Tokyo,Evening
23,Sam,Python,Paris,Morning
32,Shaun,Java,Tokyo,Morning
What we want is to read all the rows of this file line by line.
Note that, we will not be reading this CSV file into lists of lists because that will be very space-consuming and time-consuming. It will also cause problems with the large data. We have to look for a solution that works as an interpreter where we can read a line one at a time so that less memory consumption will take place.
Let’s get started with it!
In python, we have two modules to read the CSV file, one is csv.reader and the second is csv.DictReader. We will use them one by one to read a CSV file line by line.

Read a CSV file line by line using csv.reader

By using the csv.reader module, a reader class object is made through which we can iterate over the lines of a CSV file as a list of values, where each value in the list is a cell value.

Read a CSV file line by line using csv.reader

Code:

from csv import reader
# open file in read mode
with open('students.csv', 'r') as read_obj:
 # pass the file object to reader() to get the reader object
csv_reader = reader(read_obj)
# Iterate over each row in the csv using reader object
for row in csv_reader:
# row variable is a list that represents a row in csv
print(row)
The above code iterated over each row of the CSV file. It fetched the content of each row as a list and printed that generated list.

How did it work?

It performed a few steps:

  1. Opened the students.csv file and created a file object.
  2. In csv.reader() function, the reader object is created and passed.
  3. Now with the reader object, we iterated it by using the for loop so that it can read each row of the csv as a list of values.
  4. At last, we printed this list.

By using this module, only one line will consume memory at a time while iterating through a csv file.

Read csv file without header

What if we want to skip a header and print the files without the header. In the previous example, we printed the values including the header but in this example, we will remove the header and print the values without the header.

Read csv file without header

Code:

from csv import reader
# skip first line i.e. read header first and then iterate over each row od csv as a list
with open('students.csv', 'r') as read_obj:
csv_reader = reader(read_obj)
header = next(csv_reader)
# Check file as empty
if header != None:
# Iterate over each row after the header in the csv
for row in csv_reader:
# row variable is a list that represents a row in csv
print(row)
We can see in the image above, that the header is not printed and the code is designed in such a way that it skipped the header and printed all the other values in a list.

Read csv file line by line using csv module DictReader object

Now, we will see the example using csv.DictReader module. CSV’s module dictReader object class iterates over the lines of a CSV file as a dictionary, which means for each row it returns a dictionary containing the pair of column names and values for that row.

Read csv file line by line using csv module DictReader object

Code:

from csv import DictReader
# open file in read mode
with open('students.csv', 'r') as read_obj:
# pass the file object to DictReader() to get the DictReader object
csv_dict_reader = DictReader(read_obj)
# iterate over each line as a ordered dictionary
for row in csv_dict_reader:
# row variable is a dictionary that represents a row in csv
print(row)
The above code iterated over all the rows of the CSV file. It fetched the content of the row for each row and put it as a dictionary.

How did it work?

It performed a few steps:

  1. Opened the students.csv file and created a file object.
  2. In csv.DictReader() function, the reader object is created and passed.
  3. Now with the reader object, we iterated it by using the for loop so that it can read each row of the csv as a dictionary of values. Where each pair in this dictionary represents contains the column name & column value for that row.

It also saves the memory as only one row at a time is in the memory.

Get column names from the header in the CSV file

We have a member function in the DictReader class that returns the column names of a csv file as a list.

Get column names from the header in the CSV fileCode:

from csv import DictReader
# open file in read mode
with open(‘students.csv’, ‘r’) as read_obj:
# pass the file object to DictReader() to get the DictReader object
csv_dict_reader = DictReader(read_obj)
# get column names from a csv file
column_names = csv_dict_reader.fieldname
print(column_names)

Read specific columns from a csv file while iterating line by line

Read specific columns (by column name) in a CSV file while iterating row by row

We will iterate over all the rows of the CSV file line by line but will print only two columns of each row.
Read specific columns (by column name) in a csv file while iterating row by row
Code:
from csv import DictReader
# iterate over each line as a ordered dictionary and print only few column by column name
withopen('students.csv', 'r')as read_obj:
csv_dict_reader = DictReader(read_obj)
for row in csv_dict_reader:
print(row['Id'], row['Name'])
DictReader returns a dictionary for each line during iteration. As in this dictionary, keys are column names and values are cell values for that column. So, for selecting specific columns in every row, we used column name with the dictionary object.

Read specific columns (by column Number) in a CSV file while iterating row by row

We will iterate over all the rows of the CSV file line by line but will print the contents of the 2nd and 3rd column.

Read specific columns (by column Number) in a csv file while iterating row by row

Code:

from csv import reader
# iterate over each line as a ordered dictionary and print only few column by column Number
with open('students.csv', 'r') as read_obj:
csv_reader = reader(read_obj)
for row in csv_reader:
print(row[1], row[2])
With the csv.reader each row of the csv file is fetched as a list of values, where each value represents a column value. So, selecting the 2nd & 3rd column for each row, select elements at index 1 and 2 from the list.
The complete code:
from csv import reader
from csv import DictReader
def main():
print('*** Read csv file line by line using csv module reader object ***')
print('*** Iterate over each row of a csv file as list using reader object ***')
# open file in read mode
with open('students.csv', 'r') as read_obj:
# pass the file object to reader() to get the reader object
csv_reader = reader(read_obj)
# Iterate over each row in the csv using reader object
for row in csv_reader:
# row variable is a list that represents a row in csv
print(row)
print('*** Read csv line by line without header ***')
# skip first line i.e. read header first and then iterate over each row od csv as a list
with open('students.csv', 'r') as read_obj:
csv_reader = reader(read_obj)
header = next(csv_reader)
# Check file as empty
if header != None:
# Iterate over each row after the header in the csv
for row in csv_reader:
# row variable is a list that represents a row in csv
print(row)
print('Header was: ')
print(header)
print('*** Read csv file line by line using csv module DictReader object ***')
# open file in read mode
with open('students.csv', 'r') as read_obj:
# pass the file object to DictReader() to get the DictReader object
csv_dict_reader = DictReader(read_obj)
# iterate over each line as a ordered dictionary
for row in csv_dict_reader:
# row variable is a dictionary that represents a row in csv
print(row)
print('*** select elements by column name while reading csv file line by line ***')
# open file in read mode
with open('students.csv', 'r') as read_obj:
# pass the file object to DictReader() to get the DictReader object
csv_dict_reader = DictReader(read_obj)
# iterate over each line as a ordered dictionary
for row in csv_dict_reader:
# row variable is a dictionary that represents a row in csv
print(row['Name'], ' is from ' , row['City'] , ' and he is studying ', row['Course'])
print('*** Get column names from header in csv file ***')
# open file in read mode
with open('students.csv', 'r') as read_obj:
# pass the file object to DictReader() to get the DictReader object
csv_dict_reader = DictReader(read_obj)
# get column names from a csv file
column_names = csv_dict_reader.fieldnames
print(column_names)
print('*** Read specific columns from a csv file while iterating line by line ***')
print('*** Read specific columns (by column name) in a csv file while iterating row by row ***')
# iterate over each line as a ordered dictionary and print only few column by column name
with open('students.csv', 'r') as read_obj:
csv_dict_reader = DictReader(read_obj)
for row in csv_dict_reader:
print(row['Id'], row['Name'])
print('*** Read specific columns (by column Number) in a csv file while iterating row by row ***')
# iterate over each line as a ordered dictionary and print only few column by column Number
with open('students.csv', 'r') as read_obj:
csv_reader = reader(read_obj)
for row in csv_reader:
print(row[1], row[2])
if __name__ == '__main__':
main()

I hope you understood this article as well as the code.

Happy reading!

Pandas: Apply a function to single or selected columns or rows in Dataframe

In this article, we will be applying given function to selected rows and column.

For example, we have a dataframe object,

matrix = [(22, 34, 23),
(33, 31, 11),
(44, 16, 21),
(55, 32, 22),
(66, 33, 27),
(77, 35, 11)
]
# Create a DataFrame object
dfObj = pd.DataFrame(matrix, columns=list('xyz'), index=list('abcdef'))
Contents of this dataframe object dgObj are,
Original Dataframe
    x    y   z
a 22 34 23
b 33 31 11
c 44 16 21
d 55 32 22
e 66 33 27
f 77 35 11

Now what if we want to apply different functions on all the elements of a single or multiple column or rows. Like,

  • Multiply all the values in column ‘x’ by 2
  • Multiply all the values in row ‘c’ by 10
  • Add 10 in all the values in column ‘y’ & ‘z’

We will use different techniques to see how we can do this.

Apply a function to a single column in Dataframe

What if we want to square all the values in any of the column for example x,y or z.

We can do such things by applying different methods. We will discuss few methods below:

Method 1: Using Dataframe.apply()

We will apply lambda function to all the columns using the above method. And then we will check if column name is whatever we want say x,y or z inside the lambda function. After this, we will square all the values. In this we will be taking z column.

dataframe.apply()

Code:

modDfObj = dfObj.apply(lambda x: np.square(x) if x.name == 'z' else x)
print("Modified Dataframe : Squared the values in column 'z'", modDfObj, sep='\n')
Output:
Modified Dataframe : Squared the values in column 'z'
 x y z
a 22 34 529
b 33 31 121
c 44 16 441
d 55 32 484
e 66 33 729
f 77 35 121

Method 2: Using [] operator

Using [] operator we will select the column from dataframe and apply numpy.square() method. Later, we will assign it back to the column.

dfObj['z'] = dfObj['z'].apply(np.square)

It will square all the values in column ‘z’.

Method 3: Using numpy.square()

dfObj['z'] = np.square(dfObj['z'])

This function will also square all the values in ‘z’.

Apply a function to a single row in Dataframe

Now, we saw what we have done with the columns. Same thing goes with rows. We will square all the values in row ‘b’. We can use different methods for that.

Method 1:Using Dataframe.apply()

We will apply lambda function to all the rows and will use the above function. We will check the label inside the lambda function and will square the row.

apply method on rows

Code:

modDfObj = dfObj.apply(lambda x: np.square(x) if x.name == 'b' else x, axis=1)
print("Modified Dataframe : Squared the values in row 'b'", modDfObj, sep='\n')

Output:

Modified Dataframe : Squared the values in row 'b'
 x y z
a 22 34 23
b 1089 961 121
c 44 16 21
d 55 32 22
e 66 33 27
f 77 35 11

Method 2 : Using [] Operator

We will do what we have done above. We will select the row from dataframe.loc[] operator and apply numpy.square() method on it. Later, we will assign it back to the row.

dfObj.loc['b'] = dfObj.loc['b'].apply(np.square)

It will square all the values in the row ‘b’.

Method 3 : Using numpy.square()

dfObj.loc['b'] = np.square(dfObj.loc['b'])

This will also square the values in row ‘b’.

Apply a function to a certain columns in Dataframe

We can apply the function in whichever column we want. For instance, squaring the values in ‘x’ and ‘y’.

function to a certain columns in Dataframe

Code:

modDfObj = dfObj.apply(lambda x: np.square(x) if x.name in ['x', 'y'] else x)
print("Modified Dataframe : Squared the values in column x & y :", modDfObj, sep='\n')
All we have to do is modify the if condition in lambda function and square the values with the name of the variables.

Apply a function to a certain rows in Dataframe

We can apply the function to specified row. For instance, row ‘b’ and ‘c’.

function to a certain rows in Dataframe

Code:

modDfObj = dfObj.apply(lambda x: np.square(x) if x.name in ['b', 'c'] else x, axis=1)
print("Modified Dataframe : Squared the values in row b & c :", modDfObj, sep='\n')

The complete code is:

import pandas as pd
import numpy as np
def main():
 # List of Tuples
 matrix = [(22, 34, 23),
(33, 31, 11),
(44, 16, 21),
(55, 32, 22),
(66, 33, 27),
(77, 35, 11)]
# Create a DataFrame object
 dfObj = pd.DataFrame(matrix, columns=list('xyz'), index=list('abcdef'))
print("Original Dataframe", dfObj, sep='\n')
print('********* Apply a function to a single row or column in DataFrame ********')
print('*** Apply a function to a single column *** ')
 # Method 1:
 # Apply function numpy.square() to square the value one column only i.e. with column name 'z'
 modDfObj = dfObj.apply(lambda x: np.square(x) if x.name == 'z' else x)
print("Modified Dataframe : Squared the values in column 'z'", modDfObj, sep='\n')
 # Method 2
 # Apply a function to one column and assign it back to the column in dataframe
dfObj['z'] = dfObj['z'].apply(np.square)
 # Method 3:
 # Apply a function to one column and assign it back to the column in dataframe
 dfObj['z'] = np.square(dfObj['z']
print('*** Apply a function to a single row *** ')
 dfObj = pd.DataFrame(matrix, columns=list('xyz'), index=list('abcdef'))
 # Method 1:
 # Apply function numpy.square() to square the values of one row only i.e. row with index name 'b'
 modDfObj = dfObj.apply(lambda x: np.square(x) if x.name == 'b' else x, axis=1)
print("Modified Dataframe : Squared the values in row 'b'", modDfObj, sep='\n')
 # Method 2:
 # Apply a function to one row and assign it back to the row in dataframe
 dfObj.loc['b'] = dfObj.loc['b'].apply(np.square)
 # Method 3:
 # Apply a function to one row and assign it back to the column in dataframe
dfObj.loc['b'] = np.square(dfObj.loc['b'])
print('********* Apply a function to certains row or column in DataFrame ********')
dfObj = pd.DataFrame(matrix, columns=list('xyz'), index=list('abcdef'))
print('Apply a function to certain columns only')
# Apply function numpy.square() to square the value 2 column only i.e. with column names 'x' and 'y' only
modDfObj = dfObj.apply(lambda x: np.square(x) if x.name in ['x', 'y'] else x)
print("Modified Dataframe : Squared the values in column x & y :", modDfObj, sep='\n')
print('Apply a function to certain rows only') # 
Apply function numpy.square() to square the values of 2 rows only i.e. with row index name 'b' and 'c' only
 modDfObj = dfObj.apply(lambda x: np.square(x) if x.name in ['b', 'c'] else x, axis=1)
print("Modified Dataframe : Squared the values in row b & c :", modDfObj, sep='\n')
if __name__ == '__main__':
main()
Output:
Original Dataframe
x y z
a 22 34 23
b 33 31 11
c 44 16 21
d 55 32 22
e 66 33 27
f 77 35 11
********* Apply a function to a single row or column in DataFrame ********
*** Apply a function to a single column *** 
Modified Dataframe : Squared the values in column 'z'
 x y z
a 22 34 529
b 33 31 121
c 44 16 441
d 55 32 484
e 66 33 729
f 77 35 121
*** Apply a function to a single row *** 
Modified Dataframe : Squared the values in row 'b'
 x y z
a 22 34 23
b 1089 961 121
c 44 16 21
d 55 32 22
e 66 33 27
f 77 35 11
********* Apply a function to certains row or column in DataFrame ********
Apply a function to certain columns only
Modified Dataframe : Squared the values in column x & y :
 x y z
a 484 1156 23
b 1089 961 11
c 1936 256 21
d 3025 1024 22
e 4356 1089 27
f 5929 1225 11
Apply a function to certain rows only
Modified Dataframe : Squared the values in row b & c :
 x y z
a 22 34 23
b 1089 961 121
c 1936 256 441
d 55 32 22
e 66 33 27
f 77 35 11

I hope you understood this article well.

Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.

Read more Articles on Python Data Analysis Using Padas – Modify a Dataframe

How to get Numpy Array Dimensions using numpy.ndarray.shape & numpy.ndarray.size() in Python

In this article, we will be discussing how to count several elements in 1D, 2D, and 3D Numpy array. Moreover, we will be discussing the counting of rows and columns in a 2D array and the number of elements per axis in a 3D Numpy array.

Let’s get started!

Get the Dimensions of a Numpy array using ndarray.shape()

NumPy.ndarray.shape

This module is used to get a current shape of an array, but it is also used to reshaping the array in place by assigning a tuple of arrays dimensions to it. The function is:

ndarray.shape

We will use this function for determining the dimensions of the 1D and 2D array.

Get Dimensions of a 2D NumPy array using ndarray.shape:

Let us start with a 2D Numpy array.

2D Numpy Array

Code:
arr2D = np.array([[11 ,12,13,11], [21, 22, 23, 24], [31,32,33,34]])
print(‘2D Numpy Array’)
print(arr2D)
Output:
2D Numpy Array 
[[11 12 13 11] 
[21 22 23 24] 
[31 32 33 34]]

Get the number of rows in this 2D NumPy array:

number of rows in this 2D numpy array

Code:

numOfRows = arr2D.shape[0]
print('Number of Rows : ', numOfRows)
Output:
Number of Rows : 3

Get a number of columns in this 2D NumPy array:

number of columns in this 2D numpy array

Code:

numOfColumns = arr2D.shape[1]
print('Number of Columns : ', numOfColumns)
Output:
Number of Columns: 4

Get the total number of elements in this 2D NumPy array:

total number of elements in this 2D numpy array

Code:

print('Total Number of elements in 2D Numpy array : ', arr2D.shape[0] * arr2D.shape[1])
Output:

Total Number of elements in 2D Numpy array: 12

Get Dimensions of a 1D NumPy array using ndarray.shape

Now, we will work on a 1D NumPy array.

number of elements of this 1D numpy array

Code:

arr = np.array([4, 5, 6, 7, 8, 9, 10, 11])
print(‘Shape of 1D numpy array : ‘, arr.shape)
print(‘length of 1D numpy array : ‘, arr.shape[0])
Output:
Shape of 1D numpy array : (8,)
length of 1D numpy array : 8

Get the Dimensions of a Numpy array using NumPy.shape()

Now, we will see the module which provides a function to get the number of elements in a Numpy array along the axis.

numpy.size(arr, axis=None)

We will use this module for getting the dimensions of a 2D and 1D Numpy array.

Get Dimensions of a 2D numpy array using numpy.size()

We will begin with a 2D Numpy array.

Dimensions of a 2D numpy array using numpy.size

Code:

arr2D = np.array([[11 ,12,13,11], [21, 22, 23, 24], [31,32,33,34]])
print('2D Numpy Array')
print(arr2D)

Output:

2D Numpy Array
[[11 12 13 11]
[21 22 23 24]
[31 32 33 34]]

Get a number of rows and columns of this 2D NumPy array:

number of rows and columns of this 2D numpy array

Code:

numOfRows = np.size(arr2D, 0)
# get number of columns in 2D numpy array
numOfColumns = np.size(arr2D, 1)
print('Number of Rows : ', numOfRows)
print('Number of Columns : ', numOfColumns)
Output:
Number of Rows : 3
Number of Columns: 4

Get a total number of elements in this 2D NumPy array:

 total number of elements in this 2D numpy array

Code:

print('Total Number of elements in 2D Numpy array : ', np.size(arr2D))

Output:

Total Number of elements in 2D Numpy array: 12

Get Dimensions of a 3D NumPy array using numpy.size()

Now, we will be working on the 3D Numpy array.

3D Numpy array

Code:

arr3D = np.array([ [[11, 12, 13, 11], [21, 22, 23, 24], [31, 32, 33, 34]],
[[1, 1, 1, 1], [2, 2, 2, 2], [3, 3, 3, 3]] ])
print(arr3D)
Output:
[[[11 12 13 11]
[21 22 23 24]
[31 32 33 34]]
[[ 1 1 1 1]
[ 2 2 2 2]
[ 3 3 3 3]]]

Get a number of elements per axis in 3D NumPy array:

number of elements per axis in 3D numpy array

Code:

print('Axis 0 size : ', np.size(arr3D, 0))
print('Axis 1 size : ', np.size(arr3D, 1))
print('Axis 2 size : ', np.size(arr3D, 2))

Output:

Axis 0 size : 2
Axis 1 size : 3
Axis 2 size : 4

Get the total number of elements in this 3D NumPy array:

total number of elements in this 3D numpy array

Code:

print(‘Total Number of elements in 3D Numpy array : ‘, np.size(arr3D))

Output:

Total Number of elements in 3D Numpy array : 24

Get Dimensions of a 1D NumPy array using numpy.size()

Let us create a 1D array.

Dimensions of a 1D numpy array using numpy.size()

Code:

arr = np.array([4, 5, 6, 7, 8, 9, 10, 11])
print('Length of 1D numpy array : ', np.size(arr))

Output:

Length of 1D numpy array : 8
A complete example is as follows:
import numpy as np
def main():
print('**** Get Dimensions of a 2D numpy array using ndarray.shape ****')
# Create a 2D Numpy array list of list
 arr2D = np.array([[11 ,12,13,11], [21, 22, 23, 24], [31,32,33,34]])
print('2D Numpy Array')
print(arr2D)
 # get number of rows in 2D numpy array
 numOfRows = arr2D.shape[0]
 # get number of columns in 2D numpy array
 numOfColumns = arr2D.shape[1]
print('Number of Rows : ', numOfRows)
print('Number of Columns : ', numOfColumns)
print('Total Number of elements in 2D Numpy array : ', arr2D.shape[0] * arr2D.shape[1])
print('**** Get Dimensions of a 1D numpy array using ndarray.shape ****')
 # Create a Numpy array from list of numbers
arr = np.array([4, 5, 6, 7, 8, 9, 10, 11])
print('Original Array : ', arr)
print('Shape of 1D numpy array : ', arr.shape)
print('length of 1D numpy array : ', arr.shape[0])
print('**** Get Dimensions of a 2D numpy array using np.size() ****')
 # Create a 2D Numpy array list of list
 arr2D = np.array([[11, 12, 13, 11], [21, 22, 23, 24], [31, 32, 33, 34]])
print('2D Numpy Array')
print(arr2D)
 # get number of rows in 2D numpy array
 numOfRows = np.size(arr2D, 0)
 # get number of columns in 2D numpy array
 numOfColumns = np.size(arr2D, 1)
print('Number of Rows : ', numOfRows)
print('Number of Columns : ', numOfColumns)
print('Total Number of elements in 2D Numpy array : ', np.size(arr2D))
print('**** Get Dimensions of a 3D numpy array using np.size() ****')
 # Create a 3D Numpy array list of list of list
 arr3D = np.array([ [[11, 12, 13, 11], [21, 22, 23, 24], [31, 32, 33, 34]],
[[1, 1, 1, 1], [2, 2, 2, 2], [3, 3, 3, 3]] ])
print('3D Numpy Array')
print(arr3D)
print('Axis 0 size : ', np.size(arr3D, 0))
print('Axis 1 size : ', np.size(arr3D, 1))
print('Axis 2 size : ', np.size(arr3D, 2))
print('Total Number of elements in 3D Numpy array : ', np.size(arr3D))
print('Dimension by axis : ', arr3D.shape)
print('**** Get Dimensions of a 1D numpy array using numpy.size() ****')
 # Create a Numpy array from list of numbers
arr = np.array([4, 5, 6, 7, 8, 9, 10, 11])
print('Original Array : ', arr)
print('Length of 1D numpy array : ', np.size(arr))
if __name__ == '__main__':
main()
Output:
**** Get Dimensions of a 2D numpy array using ndarray.shape ****
2D Numpy Array
[[11 12 13 11]
[21 22 23 24]
[31 32 33 34]]
Number of Rows : 3
Number of Columns : 4
Total Number of elements in 2D Numpy array : 12
**** Get Dimensions of a 1D numpy array using ndarray.shape ****
Original Array : [ 4 5 6 7 8 9 10 11]
Shape of 1D numpy array : (8,)
length of 1D numpy array : 8
**** Get Dimensions of a 2D numpy array using np.size() ****
2D Numpy Array
[[11 12 13 11]
[21 22 23 24]
[31 32 33 34]]
Number of Rows : 3
Number of Columns : 4
Total Number of elements in 2D Numpy array : 12
**** Get Dimensions of a 3D numpy array using np.size() ****
3D Numpy Array
[[[11 12 13 11]
[21 22 23 24]
[31 32 33 34]]
[[ 1 1 1 1]
[ 2 2 2 2]
[ 3 3 3 3]]]
Axis 0 size : 2
Axis 1 size : 3
Axis 2 size : 4
Total Number of elements in 3D Numpy array : 24
Dimension by axis : (2, 3, 4)
**** Get Dimensions of a 1D numpy array using numpy.size() ****
Original Array : [ 4 5 6 7 8 9 10 11]
Length of 1D numpy array : 8

I hope you understood this article well.

Web crawling and scraping in Python

In this article, we will be checking up few things:

  • Basic crawling setup In Python
  • Basic crawling with AsyncIO
  • Scraper Util service
  • Python scraping via Scrapy framework

Web Crawler

A web crawler is an automatic bot that extracts useful information by systematically browsing the world wide web.

The web crawler is also known as a spider or spider bot. Some websites use web crawling for updating their web content. Some websites do not allow crawling because of their security, so on that websites crawler works by either asking for permission or exiting out of the website.

Web Crawler

Web Scraping

Extracting the data from the websites is known as web scraping. Web scraping requires two parts crawler and scraper.

Crawler is known to be an artificial intelligence algorithm and it browses the web which leads to searching of the links we want to crawl across the internet.

Scraper is the tool that was specifically used for extracting information from the internet.

By web scraping, we can obtain a large amount of data which is in unstructured data in an HTML format and then it is converted into structured data.

Web Scraping

Crawler Demo

Mainly, we have been using two tools:

Task I

Scrap recurship website is used for extracting all the links and images present on the page.

Demo Code:

Import requests

from parsel import Selector

import time

start = time.time()

response = requests.get('http://recurship.com/')

selector  = Selector(response.text)

href_links = selector.xpath('//a/@href').getall()

image_links = selector.xpath('//img/@src').getall()

print("********************href_links****************")

print(href_links)

print("******************image_links****************")

print(image_links)

end = time.time()

print("Time taken in seconds:", (end_start)

 

Task II

Scrap recurship site and extract links, one of one navigate to each link and extract information of the images.

Demo code:

import requests
from parsel import Selector

import time
start = time.time()


all_images = {} 
response = requests.get('http://recurship.com/')
selector = Selector(response.text)
href_links = selector.xpath('//a/@href').getall()
image_links = selector.xpath('//img/@src').getall()

for link in href_links:
try:
response = requests.get(link)
if response.status_code == 200:
image_links = selector.xpath('//img/@src').getall()
all_images[link] = image_links
except Exception as exp:
print('Error navigating to link : ', link)

print(all_images)
end = time.time()
print("Time taken in seconds : ", (end-start))

 

Task II takes 22 seconds to complete. We are constantly using the python parsel” and request” package.

Let’s see some features these packages use.

Request package:

Parsel package

 

Crawler service using Request and Parsel

The code:

import requests
import time
import random
from urllib.parse import urlparse
import logging

logger = logging.getLogger(__name__)

LOG_PREFIX = 'RequestManager:'


class RequestManager:
def __init__(self):
self.set_user_agents(); # This is to keep user-agent same throught out one request

crawler_name = None
session = requests.session()
# This is for agent spoofing...
user_agents = [
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36 Edge/12.246',
'Mozilla/4.0 (X11; Linux x86_64) AppleWebKit/567.36 (KHTML, like Gecko) Chrome/62.0.3239.108 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9',
'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko'
]

headers = {}

cookie = None
debug = True

def file_name(self, context: RequestContext, response, request_type: str = 'GET'):
url = urlparse(response.url).path.replace("/", "|")
return f'{time.time()}_{context.get("key")}_{context.get("category")}_{request_type}_{response.status_code}_{url}'

# write a file, safely
def write(self, name, text):
if self.debug:
file = open(f'logs/{name}.html', 'w')
file.write(text)
file.close()

def set_user_agents(self):
self.headers.update({
'user-agent': random.choice(self.user_agents)
})

def set_headers(self, headers):
logger.info(f'{LOG_PREFIX}:SETHEADER set headers {self.headers}')
self.session.headers.update(headers)

def get(self, url: str, withCookie: bool = False, context):
logger.info(f'{LOG_PREFIX}-{self.crawler_name}:GET making get request {url} {context} {withCookie}')
cookies = self.cookie if withCookie else None
response = self.session.get(url=url, cookies=cookies, headers=self.headers)
self.write(self.file_name(context, response), response.text)
return response

def post(self, url: str, data, withCookie: bool = False, allow_redirects=True, context: RequestContext = {}):
logger.info(f'{LOG_PREFIX}:POST making post request {url} {data} {context} {withCookie}')
cookies = self.cookie if withCookie else None
response = self.session.post(url=url, data=data, cookies=cookies, allow_redirects=allow_redirects)
self.write(self.file_name(context, response, request_type='POST'), response.text)
return response

def set_cookie(self, cookie):
self.cookie = cookie
logger.info(f'{LOG_PREFIX}-{self.crawler_name}:SET_COOKIE set cookie {self.cookie}')

Request = RequestManager()

context = {
"key": "demo",
"category": "history"
}
START_URI = "DUMMY_URL" # URL OF SIGNUP PORTAL
LOGIN_API = "DUMMY_LOGIN_API"
response = Request.get(url=START_URI, context=context)

Request.set_cookie('SOME_DUMMY_COOKIE')
Request.set_headers('DUMMY_HEADERS')

response = Request.post(url=LOGIN_API, data = {'username': '', 'passphrase': ''}, context=context)

 

Class “RequestManager” offers few functionalities listed below:

Scraping with AsyncIO

All we have to do is scrap the Recurship site and extract all the links, later we navigate each link asynchronously and extract information from the images.

Demo code

import requests
import aiohttp
import asyncio
from parsel import Selector
import time

start = time.time()
all_images = {} # website links as "keys" and images link as "values"

async def fetch(session, url):
try:
async with session.get(url) as response:
return await response.text()
except Exception as exp:
return '<html> <html>' #empty html for invalid uri case

async def main(urls):
tasks = []
async with aiohttp.ClientSession() as session:
for url in urls:
tasks.append(fetch(session, url))
htmls = await asyncio.gather(*tasks)
for index, html in enumerate(htmls):
selector = Selector(html)
image_links = selector.xpath('//img/@src').getall()
all_images[urls[index]] = image_links
print('*** all images : ', all_images)


response = requests.get('http://recurship.com/')
selector = Selector(response.text)
href_links = selector.xpath('//a/@href').getall()
loop = asyncio.get_event_loop()
loop.run_until_complete(main(href_links))


print ("All done !")
end = time.time()
print("Time taken in seconds : ", (end-start))

By AsyncIO, scraping took almost 21 seconds. We can achieve more good performance with this task.

Open-Source Python Frameworks for spiders

Python has multiple frameworks which take care of the optimization

It gives us different patterns. There are three popular frameworks, namely:

  1. Scrapy
  2. PySpider
  3. Mechanical soup

Let’s use Scrapy for further demo.

Scrapy

Scrapy is a framework used for scraping and is supported by an active community. We can build our own scraping tools.

There are few features which scrapy provides:

Now we have to do is scrap the Recurship site and extract all the links, later we navigate each link asynchronously and extract information from the images.

Demo Code

import scrapy


class AuthorSpider(scrapy.Spider):
name = 'Links'

start_urls = ['http://recurship.com/']
images_data = {}
def parse(self, response):
# follow links to author pages
for img in response.css('a::attr(href)'):
yield response.follow(img, self.parse_images)

# Below commented portion is for following all pages
# follow pagination links
# for href in response.css('a::attr(href)'):
# yield response.follow(href, self.parse)

def parse_images(self, response):
#print "URL: " + response.request.url
def extract_with_css(query):
return response.css(query).extract()
yield {
'URL': response.request.url,
'image_link': extract_with_css('img::attr(src)')
}

Commands

scrapy run spider -o output.json spider.py

The JSON file got export in 1 second.

Conclusion

We can see that the scrapy performed an excellent job. If we have to perform simple crawling, scrapy will give the best results.

Enjoy scraping!!

 

 

 

 

Automation of Whatsapp Messages to unknown Users using selenium and Python

Nowadays, we have to send messages to multiple users for either personal reasons or for commercial as well as business reasons. 

How amazing it will be if we don’t have to send messages again and again typing the same thing to more than 100 contacts. It will be hectic and boring.
In this article, we will be making a WhatsApp bot that will automate the messages and send the messages to multiple users at the same time without you typing it again and again.

This Bot will make your work easy and will be less time-consuming.

Let’s get ready to make an automation bot!

Importing the modules:

To make this automation Bot, we have to import some modules.

Firstly, we have to import selenium and python as a basic step.

To import selenium type the following command in your terminal:

python -m pip install selenium

Now we have to install WebDriver.

For doing so, go on geckodriver releases page and find the latest version suitable for your desktop.

Extract the file and copy the path and write it in your code.

Let’s get started with the code!

Working on the code:

The first and foremost task is to import the selenium modules which will be used in the code.

from selenium import webdriver
from csv import reader
import time

Pandas : Convert Data frame index into column using dataframe.reset_index() in python

In this article, we will be exploring ways to convert indexes of a data frame or a multi-index data frame into its a column.

There is a function provided in the Pandas Data frame class to reset the indexes of the data frame.

Dataframe.reset_index()

DataFrame.reset_index(self, level=None, drop=False, inplace=False, col_level=0, col_fill='')

It returns a data frame with the new index after resetting the indexes of the data frame.

  • level: By default, reset_index() resets all the indexes of the data frame. In the case of a multi-index dataframe, if we want to reset some specific indexes, then we can specify it as int, str, or list of str, i.e., index names.
  • Drop: If False, then converts the index to a column else removes the index from the dataframe.
  • Inplace: If true, it modifies the data frame in place.

Let’s use this function to convert the indexes of dataframe to columns.

The first and the foremost thing we will do is to create a dataframe and initialize it’s index.

Code:
empoyees = [(11, ‘jack’, 34, ‘Sydney’, 70000) ,
(12, ‘Riti’, 31, ‘Delhi’ , 77000) ,
(13, ‘Aadi’, 16, ‘Mumbai’, 81000) ,
(14, ‘Mohit’, 31,‘Delhi’ , 90000) ,
(15, ‘Veena’, 12, ‘Delhi’ , 91000) ,
(16, ‘Shaunak’, 35, ‘Mumbai’, 75000 ),
(17, ‘Shaun’, 35, ‘Colombo’, 63000)]
# Create a DataFrame object
empDfObj = pd.DataFrame(empoyees, columns=[‘ID’ , ‘Name’, ‘Age’, ‘City’, ‘Salary’])
# Set ‘ID’ as the index of the dataframe
empDfObj.set_index(‘ID’, inplace=True)
print(empDfObj)

dataframe

Now, we will try different things with this dataframe.

Convert index of a Dataframe into a column of dataframe

To convert the index ‘ID‘ of the dataframe empDfObj into a column, call the reset_index() function on that dataframe,

Code:
modified = empDfObj.reset_index()
print(“Modified Dataframe : “)
print(modified)

Modified Dataframe

Since we haven’t provided the inplace argument, so by default it returned the modified copy of a dataframe.

In which the indexID is converted into a column named ‘ID’ and automatically the new index is assigned to it.

Now, we will pass the inplace argument as True to proceed with the process.

Code:
empDfObj.reset_index(inplace=True)
print(empDfObj)

dataframe with inplace argument

Now, we will set the column’ID’ as the index of the dataframe.

Code:
empDfObj.set_index('ID', inplace=True)

Remove index  of dataframe instead of converting into column

Previously, what we have done is convert the index of the dataframe into the column of the dataframe but now we want to just remove it. We can do that by passing drop argument as True in the reset_index() function,

Code:
modified = empDfObj.reset_index(drop=True)
print("Modified Dataframe : ")
print(modified)

Remove index of dataframe instead of converting into column

We can see that it removed the dataframe index.

Resetting indexes of a Multi-Index Dataframe

Let’s convert the dataframe object empDfObj  into a multi-index dataframe with two indexes i.e. ID & Name.

Code:
empDfObj = pd.DataFrame(empoyees, columns=['ID', 'Name', 'Age', 'City', 'Salary'])
# set multiple columns as the index of the the dataframe to
# make it multi-index dataframe.
empDfObj.set_index(['ID', 'Name'], inplace=True)
print(empDfObj)

Resetting indexes of a Multi-Index Dataframe

Convert all the indexes of Multi-index Dataframe to the columns of Dataframe

In the previous module, we have made a dataframe with the multi-index but now here we will convert the indexes of multi-index dataframe to the columns of the dataframe.

To do this, all we have to do is just call the reset_index() on the dataframe object.

Code:
modified = empDfObj.reset_index()
print(modified)

Convert all the indexes of Multi-index Dataframe to the columns of Dataframe

It converted the index ID and Name to the column of the same name.

Suppose, we want to convert only one index from the multiple indexes. We can do that by passing a single parameter in the level argument.

Code:
modified = empDfObj.reset_index(level='ID')
print("Modified Dataframe: ")
print(modified)

convert only one index from the multiple indexes

It converted the index’ID’ to the column with the same index name. Similarly, we can follow this same procedure to carry out the task for converting the name index to the column.

You should try converting the code for changing Name index to column.

We can change both the indexes and make them columns by passing mutiple arguments in the level  parameter.

Code:
modified = empDfObj.reset_index(level=['ID', 'Name'])
print("Modified Dataframe: ")
print(modified)

change both the indexes and make them columns

The complete code:

import pandas as pd
def main():
 # List of Tuples
 empoyees = [(11, 'jack', 34, 'Sydney', 70000) ,
(12, 'Riti', 31, 'Delhi' , 77000) ,
(13, 'Aadi', 16, 'Mumbai', 81000) ,
(14, 'Mohit', 31,'Delhi' , 90000) ,
(15, 'Veena', 12, 'Delhi' , 91000) ,
(16, 'Shaunak', 35, 'Mumbai', 75000 ),
(17, 'Shaun', 35, 'Colombo', 63000)]
 # Create a DataFrame object
 empDfObj = pd.DataFrame(empoyees, columns=['ID' , 'Name', 'Age', 'City', 'Salary'])
 # Set 'ID' as the index of the dataframe
 empDfObj.set_index('ID', inplace=True)
print("Contents of the Dataframe : ")
print(empDfObj)
print('Convert the index of Dataframe to the column')
 # Reset the index of dataframe
 modified = empDfObj.reset_index()
print("Modified Dataframe : ")
print(modified)
print('Convert the index of Dataframe to the column - in place ')
 empDfObj.reset_index(inplace=True)
print("Contents of the Dataframe : ")
print(empDfObj)
 # Set 'ID' as the index of the dataframe
 empDfObj.set_index('ID', inplace=True)
print('Remove the index of Dataframe to the column')
 # Remove index ID instead of converting into a column
 modified = empDfObj.reset_index(drop=True)
print("Modified Dataframe : ")
print(modified)
print('Reseting indexes of a Multi-Index Dataframe')
 # Create a DataFrame object
 empDfObj = pd.DataFrame(empoyees, columns=['ID', 'Name', 'Age', 'City', 'Salary'])
 # set multiple columns as the index of the the dataframe to
 # make it multi-index dataframe.
 empDfObj.set_index(['ID', 'Name'], inplace=True)
print("Contents of the Multi-Index Dataframe : ")
print(empDfObj)
print('Convert all the indexes of Multi-index Dataframe to the columns of Dataframe')
 # Reset all indexes of a multi-index dataframe
 modified = empDfObj.reset_index()
print("Modified Mult-Index Datafrme : ")
print(modified)
print("Contents of the original Multi-Index Dataframe : ")
print(empDfObj)
 modified = empDfObj.reset_index(level='ID')
print("Modified Dataframe: ")
print(modified)
 modified = empDfObj.reset_index(level='Name')
print("Modified Dataframe: ")
print(modified)
 modified = empDfObj.reset_index(level=['ID', 'Name'])
print("Modified Dataframe: ")
print(modified)
if __name__ == '__main__':
main()

Hope this article was useful for you and you grabbed the knowledge from it.

Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.

Read more Articles on Python Data Analysis Using Padas – Modify a Dataframe

Python : How to find keys by value in dictionary ?

In this article, we will see how to find all the keys associated with a single value or multiple values.

For instance, we have a dictionary of words.

dictOfWords = {"hello": 56,
"at" : 23 ,
"test" : 43,
"this" : 97,
"here" : 43,
"now" : 97
}

Now, we want all the keys in the dictionary having value 43, which means “here” and “test”.

Let’s see how to get it.

Find keys by value in the dictionary

Dict.items() is the module that returns all the key-value pairs in a dictionary. So, what we will do is check whether the condition is satisfied by iterating over the sequence. If the value is the same then we will add the key in a separate list.

def getKeysByValue(dictOfElements, valueToFind):
 listOfKeys = list()
listOfItems = dictOfElements.items()
for item in listOfitems:
if item[1] == valueToFind:
 listOfKeys.append(item[0])
return listOfKeys
We will use this function to get the keys by value 43.
listOfKeys = getKeysByValue(dictOfWords, 43)
print("Keys with value equal to 43")
#Iterate over the list of keys
for key in listOfKeys:
print(key)

Find keys by value in dictionary

We can achieve the same thing with a list comprehension.

listOfKeys = [key for (key, value) in dictOfWords.items() if value == 43]

Find keys in the dictionary by value list

Now, we want to find the keys in the dictionary whose values matches with the value we will give.

[43, 97]

We will do the same thing as we have done above but this time we will iterate the sequence and check whether the value matches with the given value.

def getKeysByValues(dictOfElements, listOfValues):
listOfKeys = list()
listOfItems = dictOfElements.items()
for item in listOfItems:
if item[1] in listOfValues:
 listOfKeys.append(item[0])
return listOfKeys

We will use the above function:

listOfKeys = getKeysByValues(dictOfWords, [43, 97] )
#Iterate over the list of values
for key in listOfKeys:
print(key)

Find keys in dictionary by value list

Complete Code:

'''Get a list of keys from dictionary which has the given value'''
def getKeysByValue(dictOfElements, valueToFind):
listOfKeys = list()
listOfItems = dictOfElements.items()
for item in listOfItems:
if item[1] == valueToFind:
listOfKeys.append(item[0])
return listOfKeys
'''
Get a list of keys from dictionary which has value that matches with any value in given list of values
'''
def getKeysByValues(dictOfElements, listOfValues):
listOfKeys = list()
listOfItems = dictOfElements.items()
for item in listOfItems:
if item[1] in listOfValues:
listOfKeys.append(item[0])
return listOfKeys
def main():
# Dictionary of strings and int
dictOfWords = {
"hello": 56,
"at" : 23 ,
"test" : 43,
"this" : 97,
"here" : 43,
"now" : 97}
print("Original Dictionary")
print(dictOfWords)
'''
Get list of keys with value 43
'''
listOfKeys = getKeysByValue(dictOfWords, 43)
print("Keys with value equal to 43")
#Iterate over the list of keys
for key in listOfKeys:
print(key)
print("Keys with value equal to 43")
'''
Get list of keys with value 43 using list comprehension
'''
listOfKeys = [key for (key, value) in dictOfWords.items() if value == 43]
#Iterate over the list of keys
for key in listOfKeys:
print(key)
print("Keys with value equal to any one from the list [43, 97] ")
'''
Get list of keys with any of the given values
'''
listOfKeys = getKeysByValues(dictOfWords, [43, 97] )
#Iterate over the list of values
for key in listOfKeys:
print(key)
if __name__ == '__main__':
main()

 

Hope this article was useful for you.

Enjoy Reading!