Data Persistence – Python and Excel

Python Data Persistence – Python and Excel

Interactive spreadsheet applications have been in use for quite a along. Over the years, Microsoft Excel has emerged as the market leader in this software category. Early versions of MS Excel used a proprietary data representation format. After 2003, Microsoft has adopted Office Open XML (OOXML) file format for its later versions of MS Excel.

OOXML is an ECMA standard file format. This has led to the development of programming interfaces for Excel in many programming environments including Python. Out of many Python packages, openpyxl is the most popular and can read/write Excel files with the .xlsx extension. Pandas is a very popular Python library. It has various tools for performing the analysis of data stored in different file formats. Data from Excel sheet, CSV file, or SQL table can be imported into Pandas data frame object for processing.

In this chapter, first, we shall learn to use openpyxl to programmatically perform various operations on an Excel file such as copying a range, define, and copy formulas, insert images, create charts, and so on.

In the second part of this chapter, we get acquainted with Pandas library – specifically, how to read/write data from/to an Excel worksheet.
Before proceeding further, here is a brief recap of important terms used in the Excel spreadsheet. An Excel document is called a workbook and is saved as a .xlsx file. A workbook may have multiple worksheets. Each worksheet is a grid of a large number of cells, each of which can store one piece of data – either value or formula. Each cell in the grid is identified by its row and column number. Columns are identified by alphabets, A, B, C,…., Z, AA, AB, and so on. Rows are numbered starting from 1. (figure 10.1)

Python Data Presistence - Python and Excel chapter 10 img 1

Excel With openpyxml

The openpyxl is an open-source package. Its installation is straightforward by using pip utility. It is recommended that you set a virtual environment first and then install openpyxl in it using the following command:

Example

E:\>cd excelenv 
E:\excelenv>scripts\activate
(excelenv) E:\excelenv>scripts\pip3 install openpyxl Collecting openpyxl
Using cached https://files.pythonhosted.org/packag- es/Sf/f8/a5d3a4ab669f99154f87ab531192dd84ac79aae62e- 
fab6G2bd2d82a72194/openpyxl-2.6.1.tar.gz Collecting jdcal (from openpyxl)
Using cached https://files.pythonhosted.org/packag- es/aO/38/dcf8353248Of25284f3ef13f8ed63e03c58a65c9d- 
3ba2a6a894ed94972 07/j deal-1.4-py2.py3-none-any.whl Collecting et_xmlfile (from openpyxl)
Using cached https://files.pythonhosted.org/ packages/22/28/a99c42aea746el8382ad9fb36f64c- 
lclf04216f41797f2fOf a567dall3 8 8/et_xmlfile -1.0.1.tar. gz
Installing collected packages: jdcal, et-xmlfile, openpyxl
Running setup.py install for et-xmlfile ... done Running setup.py install for openpyxl ... done 
Successfully installed et-xmlfile-1.0.1 jdcal-1.4 openpyxl-2.6.1

Read Cell Range to List

First of all, let us read back the data from A2:C7 in a Python list object by traversing the range with two nested loops. Each row is collected in a tuple and appended to a list.

Example

sheet1 = wb['PriceList'] 
pricelist= [ ] 
for row in range(1,7): prod= [ ]
for col in range(1,4):
val=sheetl.cell(column=col,
row=2+row).value
prod.append(val) 
pricelist.append(tuple(prod)) 
print (pricelist)

The result will be as shown below:

[(1, 'Laptop', 25000) , (2, 'TV' , 40000) , (3, 'Router', 2000) , (4, 'Printer', 9000)] 'Scanner', 5000) , (5, 'printer' , 9000)]

Merge and Center

To merge Al-Cl cells use the merge_cells ( ) method.

sheet1.merge_cells('A1:C1')

The openpyx1. styles module defines Alignment and Font classes. Apply ‘Center’ alignment to text in ‘AT

cell=sheetl[1A1']
cell.alignment=Alignment('horizontal center')

The Font object can be configured by defining attributes like name, size, color, and so on. Font constructor also accepts bold, italic, underline, and strike, as Boolean attributes (True/False).

cell.font=Font(name='Calibri',size=20,bold=True)

Insert Image

First, install the pillow library in your current Python environment. It is an open-source Python image library that provides support for opening, saving, and manipulating image files.

pip3 install pillow

Let us create a new worksheet in ‘example.xlsx’ in which we shall insert an image.

wb = load_workbook('example.xlsx') 
sheet2=wb.create_sheet(title="image")

Now import Image class from openpyxl. drawing. image module. Obtain the Image object from the image file to be inserted. Finally call add_image () method of the worksheet. This method needs image object and location, (figure 10.9)

Example

from openpyxl.drawing.image import Image
img = Image('openpyxl.jpg')
sheet2.add_image(img,'Al')
wb . save (filename= ' example . xlsx ' )

Define Formula

It is very easy to define a formula in a cell. Just assign string representation of cell formula as would appear in the formula bar of Excel. For example, if you want to set cell 8 to sum of cells between C3 to 7, assign it to ‘=SUM(C3:C7)’

Example

sheet1 ['B8'] ='SUM'
sheet1['C8']='SUM(C3:C7)'
sheet1['C9'] ='=AVERAGE(C3:C7)'

All above actions are collected in the following script. Run it from the command prompt and then open the workbook, (figure 10.4)

Example

#readworkbook.py
from openpyxl import load_workbook 
wb = load_workbook('test.xlsx') 
sheet1 = wb['PriceList']
#copy range to list 
pricelist= [ ] 
for row in range(1,7): 
prod= [ ]
for col in range(1,4):
val=sheetl.cell(column=col,
row=2+row).value
prod.append(val) 
pricelist.append(tuple(prod)) 
print (pricelist)
#merge and center
sheetl.merge_cells('A1:C1')#merge
cell=sheetl['A1']
cell.alignment=Alignment(horizontal'center1) #apply font
cell.font=Font(name='Calibri',size=20,bold=True)
#define formula
sheetl['b8']='SUM'
sheetl['C8']='=SUM(C3:C7)'
sheetl['C9']='=AVERAGE(C3:C7)'

Python Data Presistence - Define Formula chapter 10 img 1

Copy Formula

One of the important features of Excel software is the ability to copy a cell formula either with the relative or absolute address to other cells or ranges. In the above worksheet, we calculate the difference of each price and average and store it in Column D. Accordingly formula for D4 should be C4-C9. It is to be copied for range D5:D9

The openpyxml. formula module defines Translator class having translate_formula ( ) function that copies formula at original cell (D4) to the required range.

Example

from openpyxl.formula.translate import 
Translator#copy formula 
sheet1['D2']='DIFF' 
sheet1['D3']= ' =C$9-C3' 
sheet1['D4'] = Translator("=C$9-C3", origin="D3") . translate_formula("D4") 
for row in range(4,8): 
coor=sheetl.cell(column=4, row=row). 
coordinate#copy formula to range 
sheet1.cell(column=4, row=row). 
value=Translator("=C$9-C3", origin="D3"). \ 

translate formula(coor)

Using the translate_formula, we can also copy a range of cells to other location. Following snippet copies range A2:D2 which is the table’s heading row to A10:D10 cell range.

Example

sheet1['A10'] = '=A2' 
for col in range(1,4): 
coor=sheet1.cell(column=col,row=3).coordinate 
coor1=sheet1.cell(column=col, row=10).coordinate 
print (coor,coor1) 
sheet1.cell(column=col, row=10). 
value=Translator("=A2", origin="A10"). \ 

translate formula(coor1)

After copying the formula and range, (figure 10.5) the worksheet appears as follows:

Python Data Presistence - Copy Formula chapter 10 img 1

Leave a Comment