More Complex File Manipulation with Python

Python is a very convenient language that’s  frequently used for  data science, scripting and web development.

In this article, we will see  how to get different kinds of file information.

Using the os module you can get more information about it.

Getting the different kinds of file information

OS module introduced with large number of tools to deal with various filenames, directories and paths.

To find out a list of all  files and subdirectories in a particular directory , we are using os.listdir().

import os
entries = os.listdir("C:\\New folder\\Python project(APT)\\")

os.listdir() returns a list hold the names of the files and subdirectories in the given folder .

Output:

['articles', 'django folder', 'FilePath.py', 'hello.py', 'imagedownload', 'images', 'lcm', 'lcm2', 'newtons-second-law', 'RssScrapy.py', 'Scrapy1', 'scrapy2', 'scrapy3', 'speedofsound', 'studyrank', 'twosum.py']

A directory listing like that have some difficulty while reading. So we use loop to make it little clear-

import os
entries  = os.listdir("C:\\New folder\\Python project(APT)\\")
for entry in entries:
    print(entry)

Output:

articles
django folder
FilePath.py
hello.py
imagedownload
images
lcm
lcm2
newtons-second-law
RssScrapy.py
Scrapy1
scrapy2
scrapy3
speedofsound
studyrank
twosum.py

So you can see that with the help of loop we can make reading all subfolder little clear.

Listing of directory in Modern Python Versions:

In Python modern versions , an alternative to os.listdir() is to use os.scandir() and pathlib.path().

os.scandir() was introduced in Python 3.5. os.scandir() returns an iterator as opposed to a list when called.

import os

entries = os.scandir("C:\\New folder\\Python project(APT)\\")

print(entries)

Output:

RESTART: C:/Users/HP/Desktop/article3.py
<nt.ScandirIterator object at 0x0371F9F8>

The scandir points out to all the entries in the current directory. You can loop over the entries of the iterator and print out the filenames.While above it will show you object name.

Another method to get a directory listing is to use the pathlib module:

from pathlib import Path

entries = Path("C:\\New folder\\Python project(APT)\\")
for entry in entries.iterdir():
    print(entry.name)

Output:

RESTART: C:/Users/HP/Desktop/article3.py
articles
django folder
FilePath.py
hello.py
imagedownload
images
lcm
lcm2
newtons-second-law
RssScrapy.py
Scrapy1
scrapy2
scrapy3
speedofsound
studyrank
twosum.py

So you have seen three method to list all filenames of any directory which is os.listdir(),os.scandir() and pathlib.path().

List out All Files in a Directory:

To separate out folders and only list files from a directory listing produced by os.listdir(), use os.path():

import os
# List all files in a directory using os.listdir
basepath = ("C:\\New folder\\Python project(APT)\\")
for entry in os.listdir(basepath):
    if os.path.isfile(os.path.join(basepath, entry)):
        print(entry)

Output:

RESTART: C:/Users/HP/Desktop/article3.py
FilePath.py
hello.py
RssScrapy.py
twosum.py

Here we can see that,os.listdir() returns a list of everything in the specified path and then that list is filtered with the help of os.path.itself() to  print out only  files and not directories.

So now we will see other easier way to list files in a directory is by using os.scandir() or pathlib.path() :

import os

# List all files in a directory using scandir()
basepath = "C:\\New folder\\Python project(APT)\\"
with os.scandir(basepath) as entries:
    for entry in entries:
        if entry.is_file():
            print(entry.name)

Using os.scandir() has more clear than os.listdir(), even though it is one line of code long. In this we are calling entry.is_file() on each item in the Scandir() returns True if the object is a file.

Output:

RESTART: C:/Users/HP/Desktop/article3.py 
FilePath.py 
hello.py 
RssScrapy.py 
twosum.py

Here’s we will show to list out files in a directory using pathlib.path():

from pathlib import Path

basepath = Path("C:\\New folder\\Python project(APT)\\")
files_in_basepath = basepath.iterdir()
for item in files_in_basepath:
    if item.is_file():
        print(item.name)

Output:

RESTART: C:/Users/HP/Desktop/article3.py

FilePath.py
hello.py
RssScrapy.py
twosum.py

List out Subdirectories:

To list out subdirectories other than files, use one of the methods below.

import os

# List all subdirectories using os.listdir
basepath = "C:\\New folder\\Python project(APT)\\"
for entry in os.listdir(basepath):
    if os.path.isdir(os.path.join(basepath, entry)):
        print(entry)

Here’s we have shown how to use os.listdir() and os.path():

Output:

articles
django folder
imagedownload
images
lcm
lcm2
newtons-second-law
Scrapy1
scrapy2
scrapy3
speedofsound
studyrank

Getting File Attributes

This will first get a list of files in directory and their attributes and then call convert.date()  to convert each file’s last modified time into a human readable form .convert.date() makes use of .strftime() to convert the time in seconds into a string.

from datetime import datetime
from os import scandir

def convert_date(timestamp):
    d = datetime.utcfromtimestamp(timestamp)
    formated_date = d.strftime('%d %b %Y')
    return formated_date

def get_files():
    dir_entries = scandir("C:\\New folder\\Python project(APT)\\")
    for entry in dir_entries:
        if entry.is_file():
            info = entry.stat()
            print(f'{entry.name}\t Last Modified: {convert_date(info.st_mtime)}')
print(get_files())            

Output:

FilePath.py        Last Modified: 19 Apr 2021
hello.py             Last Modified: 17 Apr 2021
RssScrapy.py     Last Modified: 17 Apr 2021
twosum.py        Last Modified: 17 Apr 2021

So by above method we are able to get  the time the files in directry were last modified.

Conclusion:

So we have seen how to get details of any directory by using three methods i.e. os.listdir() , os.scandir() and pathlib.path().We have also seen how to get only files on that particular folder seperately and also attribute of that folder.