Python Data Persistence – Object Serialization
Python’s built-in File object and its methods of performing read/write operations are undoubtedly invaluable, as the ability to store data in a persistent medium is as important as processing it. However, the File object returned by Python’s built-in open ( ) function has one important shortcoming, as you must have noted in the previous chapter.
When opened with ‘w’ mode, the write ( ) method accepts only the string object. That means, if you have data represented in any non-string form, the object of either in built-in classes (numbers, dictionary, lists, or tuples) or other user-defined classes, it cannot be written to file directly.
Example
>>> numbers= [10,20,30,40] >>> file=open ('numbers . txt' , 'w' ) >>> file .write (numbers) Traceback (most recent call last) : File "<pyshell#10>", line 1, in <module> file .write (numbers) TypeError: write( ) argument must be str, not list >>> p1=person( ) >>> class person: def__init__(self): self . name='Anil' >>> p1=person( ) >>> file=open ('persons . txt' , 'w' ) >>> file . write (p1) Traceback (most recent call last): File "<pyshell#20>", line 1, in <module> file . write (p1) TypeError: write() argument must be str, not person >>>
Before writing, you need to convert it in its string representation.
Example
>>> numbers=[10,20,30,40] >>> file=open (' numbers . txt' , ' w' ) >>> file .write (str (numbers) ) >>> file . close ( )
In case of a user-defined class:
Example
>>> class person: def__init__(self): self.name='Anil' >>> p1=person() >>> file=open ('persons . txt ' , 'w' ) >>> file . write (p1.__str__( ) ) >>> file.close ( )
To read back data from the file in the respective data type, reverse conversion needs to be done.
Example
>>> data=file . read ( ) >>> list(data) [10, 20, 30, 40]
File object with ‘wb’ mode requires bytes object to be provided, as an argument to write () method. In above case, the list of integers is converted to bytes by bytearray () function and then written to file as below:
Example
>>> numbers=[10,20,30,40] >>> data=bytearray(numbers) >>> file=open ( ' numbers . txt' , ' wb' ) >>> file . write (data) >>> file . close ( )
In case of user-defined class, attributes of its objects will have to be converted to byte objects before writing to a disk file:
Example
>>> file=open (' persons . txt1 , ' wb' ) >>> file .write (pi .name . encode () )
This type of manual conversion of objects in the string or byte format (and vice versa) is very cumbersome and appears rather clunky. Python has better solutions for this requirement. Several built-in modules are there to store and retrieve a Python object directly to/from a file or byte string. A Python object is said to be serialized when it is translated in a format from which it can be reconstructed later when required. The serialized format can be stored in a disk file, byte string or can be transmitted via network sockets. When serialized data is brought back in a form identical to the original, the mechanism is called de-serialization.
Serialization formats, used by some built-in modules, are Python-specific, whereas other modules use standard serialization protocols such as JSON, XML, and so on. Pythonic term for serialization is pickling while de-serialization is often referred to as unpickling in Python documentation. Python-specific serialization/de-serialization is achieved by the built-in pickle and shelve modules. Even though Python’s marshal module offers similar functionality, it is primarily meant for internal use while reading and writing pseudo-compiled versions of Python modules with .pyc extension and is not recommended as a general persistence tool.
The serialized byte stream can optionally be written to a disk file. This is called object persistence. The File API discussed in the previous chapter stores data persistently, but it is not in a serialized format. Python serialization libraries, that we are going to explore in this chapter, are useful for storing serialized object data to disk files.