🐍 Python - pickle
Updated at 2018-06-10 22:06
Python module pickle
is for serializing and de-serializing Python objects.
- Picking means converting an object to a byte stream
- Unpickling means converting a byte stream to an object
Pickling is irreplaceable when you need to send Python objects over a network or store Python objects in a database.
Compared to older and inferior marshal
module:
- Pickle keeps track and caches already serialized objects.
- Pickle can serialize user-defined classes and instances.
- Pickle is backwards compatible across Python releases.
Compared to json
module:
- JSON is human-readable.
- JSON is not meant to serialize behavior, only data.
- JSON will be de-serializable with by languages than Python.
You can use 5 different protocols for pickling:
- v0: human-readable format that works with earlier Python versions
- v1: old binary format that works with earlier Python versions
- v2: more efficient binary format that works with Python 2.3+
- v3: has more support for byte objects and works with Python 3+
- v4: adds support for large objects and works with Python 3.4+
In a nutshell, you can pickle:
- All basic types such as booleans, numbers, strings, lists, etc.
- Functions and classes defined at the top level of a module.
- Instances of classes that define pickleable
__dict__
or__getstate()__
. - You cannot pickle lambdas.
Never unpickle data received from an untrusted source. This should go without saying. If you need to unpickle from untrusted source, be extra careful and look into Unpickler.find_class()
.
import pickle
pickle.loads(b"cos\nsystem\n(S'echo hello world'\ntR.")
# => hello world
# so attacker can do anything if unsafe bytes are unpickled
Look into __getstate__ and __setstate__ if you need to pickle stateful objects.
import pickle
def greet(name: str) -> str:
return f'Hello {name}!'
x = pickle.dumps(greet)
assert x == b'\x80\x03c__main__\ngreet\nq\x00.'
y = pickle.loads(x)
assert y('John') == 'Hello John!'
Sources
- Python 3 documentation