Pandas Archives - BTech Geeks

Different methods to read CSV files with custom delimiter in python

Pandas read csv delimeter: In this article, we will see what are CSV files, how to use them in pandas, and then we see how and why to use custom delimiter with CSV files in pandas.

CSV file

Pandas read csv separator: A simple way to store big data sets is to use CSV files (comma-separated files).CSV files contain plain text and is a well know format that can be read by everyone including Pandas. Generally, CSV files contain columns separated by commas, but they can also contain content separated by a tab, or underscore or hyphen, etc. Generally, CSV files look like this:-

total_bill,tip,sex,smoker,day,time,size
16.99,1.01,Female,No,Sun,Dinner,2
10.34,1.66,Male,No,Sun,Dinner,3
21.01,3.5,Male,No,Sun,Dinner,3
23.68,3.31,Male,No,Sun,Dinner,2
24.59,3.61,Female,No,Sun,Dinner,4

Here we see different columns and their values are separated by commas.

Use CSV file in pandas

Read csv separator: read_csv() method is used to import and read CSV files in pandas. After this step, a CSV file act as a normal dataframe and we can use operation in CSV file as we use in dataframe.

syntax: pandas.read_csv(filepath_or_buffer, sep=‘, ‘, delimiter=None, header=‘infer’, names=None, index_col=None, ….)

',' is default separator in read_csv() method.

Let see this with an example

import pandas as pd
data=pd.read_csv('example1.csv')
data.head()

Output

	total_bill	tip	sex	smoker	day	time	size
0	16.99	1.01	Female	No	Sun	Dinner	2
1	10.34	1.66	Male	No	Sun	Dinner	3
2	21.01	3.50	Male	No	Sun	Dinner	3
3	23.68	3.31	Male	No	Sun	Dinner	2
4	24.59	3.61	Female	No	Sun	Dinner	4

Why use separator or delimiter with read_csv() method

Read_csv separator: Till now we understand that generally, CSV files contain data separated data that is separated by comma but sometimes it can contain data separated by tab or hyphen, etc. So to handle this we use a seperator. Let understand this with the help of an example. Suppose we have a CSV file separated by an underscore and we try to read that CSV file without using a separator or with using default separator i.e. comma. So let see what happens in this case.

"total_bill"_tip_sex_smoker_day_time_size
16.99_1.01_Female_No_Sun_Dinner_2
10.34_1.66_Male_No_Sun_Dinner_3
21.01_3.5_Male_No_Sun_Dinner_3
23.68_3.31_Male_No_Sun_Dinner_2
24.59_3.61_Female_No_Sun_Dinner_4
25.29_4.71_Male_No_Sun_Dinner_4
8.77_2_Male_No_Sun_Dinner_2

Suppose this is our CSV file separated by an underscore.

	total_bill_tip_sex_smoker_day_time_size
0	16.99_1.01_Female_No_Sun_Dinner_2
1	10.34_1.66_Male_No_Sun_Dinner_3
2	21.01_3.5_Male_No_Sun_Dinner_3
3	23.68_3.31_Male_No_Sun_Dinner_2
4	24.59_3.61_Female_No_Sun_Dinner_4

Now see when we didn’t use a default separator here how unordered our data look like. So to solve this issue we use Separator. Now we will see when we use a separator to underscore how we get the same data in an ordered manner.

import pandas as pd 
data=pd.read_csv('example2.csv',sep = '_',engine = 'python') 
data.head()

Output

	total_bill	tip	sex	smoker	day	time	size
0	16.99	1.01	Female	No	Sun	Dinner	2
1	10.34	1.66	Male	No	Sun	Dinner	3
2	21.01	3.50	Male	No	Sun	Dinner	3
3	23.68	3.31	Male	No	Sun	Dinner	2
4	24.59	3.61	Female	No	Sun	Dinner	4

So this example is sufficient to understand why there is a need of using a separator of delimiter in pandas while working on a CSV file.

Now suppose there is a CSV file in while data is separated by multiple separators. For example:-

totalbill_tip,sex:smoker,day_time,size
16.99,1.01:Female|No,Sun,Dinner,2
10.34,1.66,Male,No|Sun:Dinner,3
21.01:3.5_Male,No:Sun,Dinner,3
23.68,3.31,Male|No,Sun_Dinner,2
24.59:3.61,Female_No,Sun,Dinner,4
25.29,4.71|Male,No:Sun,Dinner,4

Here we see there are multiple seperator used. So here we can not use any custom delimiter. To solve this problem regex or regular expression is used. Let see with the help of an example.

import pandas as pd 
data=pd.read_csv('example4.csv',sep = '[:, |_]') 
data.head()

Output

	totalbill	tip	sex	smoker	day	time	size
0	16.99	1.01	Female	No	Sun	Dinner	2
1	10.34	1.66	Male	No	Sun	Dinner	3
2	21.01	3.50	Male	No	Sun	Dinner	3
3	23.68	3.31	Male	No	Sun	Dinner	2
4	24.59	3.61	Female	No	Sun	Dinner	4

When we notice we pass a list of separators in the sep parameter that is contained in our CSV file.

Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.

Methods to delete the first column of a dataframe using Python

Remove first column pandas: In this article, we discuss different ways to delete the first column of a dataframe in pandas using python.

Method 1-Using drop() method

Drop first column pandas: This is one of the methods to delete the first columns of a dataframe in pandas using pandas.drop() method is used to delete or drop specified labels from rows or columns. Let see how the drop method works.

Syntax: DataFrame.drop(self, labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors=’raise’).

As our main task is to delete columns using this method so we have to remember some points. The first point is that the column we want to delete is to be given in the columns parameter of the drop() method. The second point is that we have to assign axis value 1 if we want to work with columns. Inplace is used if we want to make changes in the existing dataframe. If inplace is true changes will be made in the existing dataframe otherwise we have to store the dataframe in another variable. Let see this with an example.

import pandas as pd
import numpy as np
students = [('Raj', 24, 95) ,
            ('Rahul', 22,97) ,
            ('Aadi', 22,81) ,
            ('Abhay', 21,87) ,
            ('Ajjet', 21,74),
            ('Amar',22,76),
            ('Aman',20,76)]
# Create a DataFrame object
df = pd.DataFrame(  students, 
                    columns=['Name', 'Age','Marks'])
print("Original Dataframe\n")
print(df,'\n')
#Drop first column
df.drop(columns=df.columns[0], 
        axis=1, 
        inplace=True)
print("New Dataframe\n")
print(df)

Output

Original Dataframe

    Name  Age  Marks
0    Raj   24     95
1  Rahul   22     97
2   Aadi   22     81
3  Abhay   21     87
4  Ajjet   21     74
5   Amar   22     76
6   Aman   20     76 

New Dataframe

   Age  Marks
0   24     95
1   22     97
2   22     81
3   21     87
4   21     74
5   22     76
6   20     76

Here we see that we pass our first column by index in the drop method and the first column is successfully deleted. As we give inplace=True that’s why changes are made in the original dataframe.

Method 2- Using del keyword

Pandas remove first column: del keyword in python is used to delete objects. Objects can be variables, lists, etc. Here we normally use del df[df.columns[0]] to delete first column in dataframe. df. columns[0] give the name of the column at index 0 which is our column 1.As we get our column name so it is very easy to delete it using the del keyword. Here point to remember that df is our dataframe name. It is not compulsory to use df as a dataframe name. We can name the dataframe as per our wish Let see this with the help of an example.

import pandas as pd
import numpy as np
students = [('Raj', 24, 95) ,
            ('Rahul', 22,97) ,
            ('Aadi', 22,81) ,
            ('Abhay', 21,87) ,
            ('Ajjet', 21,74),
            ('Amar',22,76),
            ('Aman',20,76)]
# Create a DataFrame object
df = pd.DataFrame(  students, 
                    columns=['Name', 'Age','Marks'])
print("Original Dataframe\n")
print(df,'\n')
#Drop first column
del df[df.columns[0]]
print("New Dataframe\n")
print(df)

Output

Original Dataframe

    Name  Age  Marks
0    Raj   24     95
1  Rahul   22     97
2   Aadi   22     81
3  Abhay   21     87
4  Ajjet   21     74
5   Amar   22     76
6   Aman   20     76 

New Dataframe

   Age  Marks
0   24     95
1   22     97
2   22     81
3   21     87
4   21     74
5   22     76
6   20     76

Method 3-Using pop() method

Pandas drop first column: In Pandas, the dataframe provides a function pop(column_name). It expects a column name as an argument and deletes that column from the calling dataframe object. It also returns the deleted column as a series. Let’s use this to delete the first column of the dataframe. Let see this with the help of an example.

import pandas as pd
import numpy as np
students = [('Raj', 24, 95) ,
            ('Rahul', 22,97) ,
            ('Aadi', 22,81) ,
            ('Abhay', 21,87) ,
            ('Ajjet', 21,74),
            ('Amar',22,76),
            ('Aman',20,76)]
# Create a DataFrame object
df = pd.DataFrame(  students, 
                    columns=['Name', 'Age','Marks'])
print("Original Dataframe\n")
print(df,'\n')
#Drop first column
df.pop(df.columns[0])
print("New Dataframe\n")
print(df)

Output

Original Dataframe

    Name  Age  Marks
0    Raj   24     95
1  Rahul   22     97
2   Aadi   22     81
3  Abhay   21     87
4  Ajjet   21     74
5   Amar   22     76
6   Aman   20     76 

New Dataframe

   Age  Marks
0   24     95
1   22     97
2   22     81
3   21     87
4   21     74
5   22     76
6   20     76

So these are the methods to delete the first column of the dataframe in pandas using python. These methods can be used to remove some other columns also.

Pandas read csv delimeter – Read Csv File to Dataframe With Custom Delimiter in Python

Different methods to read CSV files with custom delimiter in python

CSV file

Use CSV file in pandas

Why use separator or delimiter with read_csv() method

Remove first column pandas – Pandas: Delete First Column of Dataframe in Python

Methods to delete the first column of a dataframe using Python

Method 1-Using drop() method

Method 2- Using del keyword

Method 3-Using pop() method