Different methods to read CSV files with custom delimiter in python
Pandas read csv delimeter: In this article, we will see what are CSV files, how to use them in pandas, and then we see how and why to use custom delimiter with CSV files in pandas.
CSV file
Pandas read csv separator: A simple way to store big data sets is to use CSV files (comma-separated files).CSV files contain plain text and is a well know format that can be read by everyone including Pandas. Generally, CSV files contain columns separated by commas, but they can also contain content separated by a tab, or underscore or hyphen, etc. Generally, CSV files look like this:-
total_bill,tip,sex,smoker,day,time,size 16.99,1.01,Female,No,Sun,Dinner,2 10.34,1.66,Male,No,Sun,Dinner,3 21.01,3.5,Male,No,Sun,Dinner,3 23.68,3.31,Male,No,Sun,Dinner,2 24.59,3.61,Female,No,Sun,Dinner,4
Here we see different columns and their values are separated by commas.
- Pandas: skip rows while reading csv file to a Dataframe using read_csv() in Python
- Python: Add a Column to an Existing CSV File
- How to save Numpy Array to a CSV File using numpy.savetxt() in Python? | Savetxt Function’s Working in Numpy with Examples
Use CSV file in pandas
Read csv separator: read_csv() method is used to import and read CSV files in pandas. After this step, a CSV file act as a normal dataframe and we can use operation in CSV file as we use in dataframe.
syntax: pandas.read_csv(filepath_or_buffer, sep=‘, ‘, delimiter=None, header=‘infer’, names=None, index_col=None, ….)
','
is default separator in read_csv() method.
Let see this with an example
import pandas as pd data=pd.read_csv('example1.csv') data.head()
Output
total_bill | tip | sex | smoker | day | time | size | |
---|---|---|---|---|---|---|---|
0 | 16.99 | 1.01 | Female | No | Sun | Dinner | 2 |
1 | 10.34 | 1.66 | Male | No | Sun | Dinner | 3 |
2 | 21.01 | 3.50 | Male | No | Sun | Dinner | 3 |
3 | 23.68 | 3.31 | Male | No | Sun | Dinner | 2 |
4 | 24.59 | 3.61 | Female | No | Sun | Dinner | 4 |
Why use separator or delimiter with read_csv() method
Read_csv separator: Till now we understand that generally, CSV files contain data separated data that is separated by comma but sometimes it can contain data separated by tab or hyphen, etc. So to handle this we use a seperator. Let understand this with the help of an example. Suppose we have a CSV file separated by an underscore and we try to read that CSV file without using a separator or with using default separator i.e. comma. So let see what happens in this case.
"total_bill"_tip_sex_smoker_day_time_size 16.99_1.01_Female_No_Sun_Dinner_2 10.34_1.66_Male_No_Sun_Dinner_3 21.01_3.5_Male_No_Sun_Dinner_3 23.68_3.31_Male_No_Sun_Dinner_2 24.59_3.61_Female_No_Sun_Dinner_4 25.29_4.71_Male_No_Sun_Dinner_4 8.77_2_Male_No_Sun_Dinner_2
Suppose this is our CSV file separated by an underscore.
total_bill_tip_sex_smoker_day_time_size | |
---|---|
0 | 16.99_1.01_Female_No_Sun_Dinner_2 |
1 | 10.34_1.66_Male_No_Sun_Dinner_3 |
2 | 21.01_3.5_Male_No_Sun_Dinner_3 |
3 | 23.68_3.31_Male_No_Sun_Dinner_2 |
4 | 24.59_3.61_Female_No_Sun_Dinner_4 |
Now see when we didn’t use a default separator here how unordered our data look like. So to solve this issue we use Separator. Now we will see when we use a separator to underscore how we get the same data in an ordered manner.
import pandas as pd data=pd.read_csv('example2.csv',sep = '_',engine = 'python') data.head()
Output
total_bill | tip | sex | smoker | day | time | size | |
---|---|---|---|---|---|---|---|
0 | 16.99 | 1.01 | Female | No | Sun | Dinner | 2 |
1 | 10.34 | 1.66 | Male | No | Sun | Dinner | 3 |
2 | 21.01 | 3.50 | Male | No | Sun | Dinner | 3 |
3 | 23.68 | 3.31 | Male | No | Sun | Dinner | 2 |
4 | 24.59 | 3.61 | Female | No | Sun | Dinner | 4 |
So this example is sufficient to understand why there is a need of using a separator of delimiter in pandas while working on a CSV file.
Now suppose there is a CSV file in while data is separated by multiple separators. For example:-
totalbill_tip,sex:smoker,day_time,size 16.99,1.01:Female|No,Sun,Dinner,2 10.34,1.66,Male,No|Sun:Dinner,3 21.01:3.5_Male,No:Sun,Dinner,3 23.68,3.31,Male|No,Sun_Dinner,2 24.59:3.61,Female_No,Sun,Dinner,4 25.29,4.71|Male,No:Sun,Dinner,4
Here we see there are multiple seperator used. So here we can not use any custom delimiter. To solve this problem regex or regular expression is used. Let see with the help of an example.
import pandas as pd data=pd.read_csv('example4.csv',sep = '[:, |_]') data.head()
Output
totalbill | tip | sex | smoker | day | time | size | |
---|---|---|---|---|---|---|---|
0 | 16.99 | 1.01 | Female | No | Sun | Dinner | 2 |
1 | 10.34 | 1.66 | Male | No | Sun | Dinner | 3 |
2 | 21.01 | 3.50 | Male | No | Sun | Dinner | 3 |
3 | 23.68 | 3.31 | Male | No | Sun | Dinner | 2 |
4 | 24.59 | 3.61 | Female | No | Sun | Dinner | 4 |
When we notice we pass a list of separators in the sep parameter that is contained in our CSV file.
Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.