Python Program to Remove Duplicate Lines from Text File

Files In Python:

A file is a piece of data or information stored on a computer’s hard drive. You’re already familiar with a variety of file kinds, including music, video, and text files. Manipulation of these files is trivial with Python. Text files and binary files are the two types of files that are commonly used. Binary files contain binary data that can only be read by a computer, whereas text files include plain text.

For programmers and automation testers, Python file handling (also known as File I/O) is a crucial topic. Working with files is required in order to write to or read data from them.

In addition, if you didn’t know, I/O activities are the most expensive techniques via which software might fail. As a result, when implementing file processing for reporting or any other reason, you should proceed with caution. The construction of a high-performance application or a robust solution for automated software testing can benefit from optimizing a single file activity.

Given a file that contains duplicate lines the task is to remove the duplicate lines of the file and store them in another File using Python.

Program to Remove Duplicate Lines from Text File in Python

Approach:

  • Make a single variable to store the path of the file. This is a constant value. This value must be replaced with the file path from your own system in the example below.
  • Open the file in read-only mode. In this case, we’re simply reading the contents of the file.
  • Make another variable to store the path of the file. This is a constant value. This value must be replaced with the file path from your own system.
  • Open another file in write mode. In this case, we’re simply writing the contents of the file.
  • Take an empty list to store the unique lines of the file.
  • Iterate through the lines of the file using the For loop.
  • Check if the line is not in the list using the if, not, in operators.
  • If the condition is true then write the line in the second file using the write function.
  • Add the line to the above list using the append() function.
  • Close the file using the close function.
  • The Exit of the Program.

Below is the Implementation:

# Make a single variable to store the path of the file. This is a constant value.
# This value must be replaced with the file path from your own system in the example below.
givenFilename = "samplefile.txt"
readFile = open(givenFilename, 'r')
# Make another variable to store the path of the file. This is a constant value.
# This value must be replaced with the file path from your own system.
writeFileName = "samplewritefile.txt"
# Open another file in write mode. In this case, we're simply writing the contents of the file.
writingFile = open("writeFileName", 'w')
# Take an empty list to store the unique lines of the file.
lineslist = []
# Open the file in read-only mode. In this case, we're simply reading the contents of the file.
for line in readFile:
    # Iterate through the lines of the file using the For loop.
    # Check if the line is not in the list using the if, not, in operators.
    if line not in lineslist:
        # If the condition is true then write the line in the second file using the write function.
        writingFile.write(line)
        # Add the line to the above list using the append() function.
        lineslist.append(line)

Input File(samplefile.txt):

Hello this is Btechgeeks
Good Morning
Hello this is Btechgeeks
python
coding Articles

Output(samplewritefile.txt):

Hello this is Btechgeeks
Good Morning
python
coding Articles

Sample Implementation in google colab:

Explanation:

  • The file path is stored in the variable ‘file name.’ Change the value of this variable to the path of your own file.
  • Dragging and dropping a file onto the terminal will show its path. The code will not run unless you change the value of this variable.
  • The file will be opened in reading mode. Use the open() function to open a file. The path to the file is the method’s first parameter, and the mode to open the file is the method’s second parameter.
  • When we open the file, we use the character ‘r’ to signify read-mode.
  • The file will be opened in writing mode. Use the open() function to open a file. The path to the file is the method’s first parameter, and the mode to open the file is the method’s second parameter.
  • When we open the file, we use the character ‘w’ to signify write-mode.
  • Write the unique lines using the write() function

Leave a Comment