How to Delete Pages from a PDF File in Python?

Let us see how to delete pages from a PDF file in Python. When working with PDF files, we may need to delete some unneeded pages. It will usually reduce the size.

To remove pages from the PDF, we will utilize the PyMuPDF library.

What is PDF?

PDFs are a popular format for distributing text. PDF is an abbreviation for Portable Document Format, and it utilizes the .pdf file extension. Adobe Systems designed it in the early 1990s.

Reading PDF documents in Python can assist you in automating a wide range of operations.

PyMuPDF Module:

To remove pages from the PDF, we will utilize the PyMuPDF library. The PyMuPDF module makes it simple to delete pages from any PDF file. We can remove a single page from a PDF as well as several pages.

We may also utilize the list to delete PDF pages.

PyMuPDF is a Python binding for MuPDF, which is a lightweight PDF, XPS, and E-book viewer, renderer, and toolkit developed and maintained by Artifex Software, Inc.

MuPDF supports PDF, XPS, OpenXPS, CBZ, EPUB, and FB2 (e-books) files and is well-known for its great performance and rendering quality.

Before we work with the module PyMuPDF, we should first install it.

Installation:

pip install PyMuPDF

Deleting Pages from a PDF File in Python

Deleting multiple pages using list:

Approach:

  • Import fitz module using the import keyword
  • Give the pdf file as static input and store it in a variable.
  • Give the output pdf filename as static input and store it in another variable.
  • Pass the given pdf file to the open() function and open it using the fitz function
  • Give the list of page numbers that are to be saved from the given PDF and store it in a variable.
  • Here it deletes rest all pages that are not given.
  • NOTE: the page numbers will be indexed starting with 0.
  • Select the pages in the given pdf file that are to be saved using the select() function
    Save them in the above output pdf file.

Below is the implementation:

# Import fitz module using the import keyword
import fitz
# Give the pdf file as static input and store it in a variable.
gvn_pdf = "btechgeeks.pdf"
# Give the output pdf filename as static input and store it in another variable.
output_pdf = "outputPDF.pdf"
# Pass the given pdf file to the open() function and open it using the 
# fitz function 
pdf_file = fitz.open(gvn_pdf)
# Give the list of page numbers that are to be saved from the given PDF
# and store it in a variable.
# Here it deletes rest all pages that are not given. 
# NOTE: the page numbers will be indexed starting with 0.
saved_pages = [0, 3, 5, 7]
# Select the pages in the given pdf file that are to be saved using the select() function
pdf_file.select(saved_pages)
# Save them in the above output pdf file.
pdf_file.save(output_pdf)

Output:

Here it deletes 1, 4, 6, 8 pages from the pdf given since page numbers index starts from 0