Charon makes data serialization and deserialization simple and secure.
Charon was inspired by the Camel project, but unlike Camel it does not force a particular serialization format. Charon offers a simple interface for defining functions that convert between complex Python objects and primitives that can be serialized into a format of your choice.
In other words, this is not a tool that takes an object and serializes it to JSON. It is a tool that takes an object and using a user-defined serialization function it converts it to basic Python types which are then serializable to JSON, YAML, msgpack etc.
This project also contains (de)serializers for some standard types like datetime.datetime
,
set` and frozenset
.
Registries can be tested (see Writing tests). This heavily relies on the pytest
module and allows the user
to:
- test implemented dumpers and loaders while keeping track of implemented versions (see Version tests);
- test if all dumpers and loaders have their tests (see Metatests);
- write tests easily by simply parametrizing a predefined generalized test case (see Generic tests).
Define dumpers and loaders using decorators with a charon.CodecRegistry
instance
and then group multiple codec registries together using charon.Codec
.
You can serialize and deserialize objects for which dumpers and loaders were defined by calling
the dump
and load
methods of Codec
.
Whenever a codec serializes an object, it checks what the serialization function returned and if it finds an object that is not a basic Python type, it tries to serialize it again. This kind of recurrent behavior can be time-consuming (and even dangerous for circular references).
Note
If there are multiple registries able to serialize the same object with same version, the first one found from the end of the registries list is picked and used.
Creating a dumper is simple. First you have to create a registry in which this dumper will be stored. This is done by:
import charon registry = charon.CodecRegistry()
Then you just have to decorate a function which will receive an instance to serialize. The class of the instance to serialize is given to the decorator as the first argument. The second argument is the version which allows versioning of your class and its serialized structure. This approach allows you to restore serialized objects that were created using an older structure (field names may change, variables may come and go etc.).
The last parameter class_hash
is optional and is used for version testing (see Version tests).
The function that you decorate should accept one argument - the object instance to be serialized. We serialize it into a dictionary containing base values from which we can restore this object.
@registry.dumper(datetime.timedelta, version = 1, class_hash = 'dadf350239d3779d72d9c933ab52db1b') def _dump_timedelta(obj): return {'days': obj.days, 'seconds': obj.seconds, 'microseconds': obj.microseconds}
Creating a loader is also simple. Again you have to have a registry which will store this loader (we use the one defined in the previous section), and again you have to decorate a function which will receive data, but this time these data are those we created using the dumper function.
The loader decorator is similar to the dumper one. The first argument is the class which we should deserialize (which we will instantiate from the data received). The second argument, version, is used as mentioned above for versioning of the class implementation. It allows developers to change the structure of their classes and still be able to restore previously serialized objects into this newer structure.
The function which you decorate should accept one parameter: the data previously returned by the serialization function (see _dump_timedelta). From the received dictionary we recreate an object using its constructor (if applicable) or setting its internal variables directly (yes, this may be appropriate when deserializing data).
@registry.loader(datetime.timedelta, version = 1, class_hash = 'dadf350239d3779d72d9c933ab52db1b') def _load_timedelta(data): return datetime.timedelta(days = data['days'], seconds = data['seconds'], microseconds = data['microseconds'])
After we have defined all our classes which will be serialized we can use charon.Codec
to serialize
and deserialize objects.
>>> import datetime
>>> import charon, charon.extensions
>>> codec = charon.Codec([charon.extensions.STANDARD_REGISTRY])
>>> delta = datetime.timedelta(seconds = 42)
>>> encoded = codec.dump(delta)
>>> print(encoded)
{'!meta': {'dtype': 'timedelta', 'version': 2}, 'params': [0, 42, 0]}
>>> loaded = codec.load(encoded)
>>> print(delta)
0:00:42
>>> print(loaded)
0:00:42
>>> print(delta == loaded)
True
Note
These tests use pytest
module.
This section describes options that Charon offers for testing loaders and dumpers. These tests are meant to help with keeping all loaders and dumpers tested and up to date with class structure.
There is also a test function that should represent basic test structure for testing a serialization/deserialization pipeline.
Charon contains a generic test definition test_serialization_pipeline
.
This test is a generalized test case consisting of object serialization, deserialization and comparison.
The original object and the deserialized object are both tested if they match the class for which the test is used. This is a sanity check to prevent serializating instances of one class and getting instances of another class after deserialization.
The original and the deserialized objects are then compared against each other using the vars
function
if possible, otherwise the standard equality operator (__eq__
) is used.
First you have to import the appropriate test function. This is best done in conftest.py
because we will use it later.
You also have to define a serializer
fixture to use with this test.
import pytest import charon from charon.testing.generic import test_serialization_pipeline @pytest.fixture def serializer(): return charon.Codec([registry])
Then you have to define parameters. You can rename the test function to some convenient name and then use a wrapper to call it. But it is better to parametrize it directly using the pytest marker pytest.mark.parametrize:
pytest.mark.parametrize([(ExampleClass, ExampleClass('Ahoy'))])(test_serialization_pipeline)
Another way to parametrize it is by using the pytest_generate_tests function of pytest.
def pytest_generate_tests(metafunc): ''' Generates test cases for simple deserialization and serialization. Test cases are generated by functions with ```generate_``` prefix ''' if metafunc.function.__name__ == 'test_serialization_pipeline': metafunc.parametrize('cls, original_obj', [(ExampleClass, ExampleClass('Ahoy'))])
Charon contains tests for testing whenever all dumpers and loaders have tests. This aproach is called metatesting.
These tests are convenient when you want to make sure that all of your dumpers and loaders have tests.
To use this metatest you have to mark your dumper and loader tests with pytest.mark
.
@pytest.mark.charon(cls = ExampleClass, dumper_test = True, loader_test = False)
def test_example_dumper(self):
pass
As you can see you use keywords to set up the mark. The cls
keyword specifies the class for which this test works,
dumper_test
specifies if this is a dumper test and obviously loader_test
specifies whenever this is a loader test.
This way you mark all of your tests. Then you just have to import metatests from the charon.testing.metatest
package.
from charon.testing.metatest import (
test_charon_dumper_tests,
test_charon_loader_tests,
scope_charon_tests
)
pytest.fixture(scope = 'module')(scope_charon_tests)
And that's all: from this point forth all loader and dumper methods would have to be properly tested.
Note
This only tests the existence of tests. It does not test the version for which those teste were written. To test the version for which the tests were written see Version tests.
Note
pytest.fixture(scope = 'module')(scope_charon_tests)
We create this fixture in the module scope and in the session scope because that could create false positive cases. Example:
You have have two registries in different codecs which serialize and deserialize the same class differently. One of these codecs have tests for this class and the others don't. If you use a session fixture in this case you will get a false positive check, because the fixture will return tests which are defined for one of the codecs but not for the other. Metatests do not check to which codec (implementation) is a test bound.
In large projects it is sometimes difficult to keep track of changes in classes, and to keep their serialization up to date. For example, in a project with multiple people colaborating on it changes in one class can be made separatwly by multiple developers, and one of them may forget to update the particular dumpers and loaders and increment their version.
To prevent this, Charon has an option to create a hash of class implementation (hash of the AST) and annotate
a dumper / loader with it. Charon also includes tests from charon.testing.ast_hash
called
test_dumpers_version
and test_loaders_version
.
To use this feature first we have to create a hash of the current implementation.
This can be done using charon_ast_hash
script provided by this package.
See: Generating a hash
When we havea hash of the class implementation we add a keyword to the standard decorator for loader / dumper.
@registry.dumper(Object, version = 1, class_hash = 'd2498176fad81ad017d1b0875eeeeb1b')
def _load_object_v1(_):
pass
This way we pass the hash of the class implementation to the registry.
Note
The hash is kept only for the latest version of a dumper / loader because we cannot check these versions with older implementations of a particular class.
These are all the changes in dumper and loader implementations. Next you have to import the test methods
charon.testing.ast_hash.test_dumpers_version` and charon.testing.ast_hash.test_loaders_version
into your test file and pass it an instance of charon.Codec
, containing the registries you want to test.
from charon.testing.ast_hash import test_dumpers_version, test_loaders_version
@pytest.fixture
def serializer():
return charon.Codec([my_registry])
From this point on, whenever you run pytest, all your loaders and dumpers which have class_hash
defined will be
checked against the hash of the current class implementation.
To generate the hash of the AST (Abstract Syntax Tree) of a class you can use script provided with Charon
called charon_ast_hash
. This script takes a list of classes to be hashed as an argument list.
Note
This command internally uses the inspect
module to get the source code which is then parsed by the ast
module.
Methods and classes from __builtins__
and from compiled libraries cannot be hashed.
$ charon_ast_hash datetime.datetime datetime.time
datetime.datetime: 4927808ca19f2a1494719baa11024a7d
datetime.time: c36b819f18698ee9143ecd92e3788c66
The Charon package comes with an implementation for some Python types that are built in or in the standard library:
decimal.Decimal
set
frozenset
datetime.datetime
datetime.date
datetime.time
datetime.timedelta
This registry can be used by simply creating your charon.codec
with an additional registry
charon.extensions.STANDARD_REGISTRY
.
Basic usage is pretty simple. You just have to create a charon.codec
object with an additional codec registry
charon.extensions.STANDARD_REGISTRY
,
preferably at the begining of the list (in case you would want to override standard implementations of dumpers / loaders).
>>> import charon, charon.extensions
>>> import decimal
>>> codec = charon.Codec([charon.extensions.STANDARD_REGISTRY])
>>> number = decimal.Decimal('4.5')
>>> print(number)
4.5
>>> serialized = codec.dump(number)
>>> print(serialized)
{'!meta': {'dtype': 'Decimal', 'version': 1}, 'params': '4.5'}
>>> loaded = codec.load(serialized)
>>> print(loaded)
4.5