-
Notifications
You must be signed in to change notification settings - Fork 89
Serialization
The custom data serialization is one of the core features of AtomicAssets. It is inspired by Google's Protobuf and is saved as a byte vector (vector<uint8_t>) to the blockchain.
It is expected to save between 30-80% of RAM for the majority of asset collections compared to traditional methods like using JSON strings.
Each template and each asset references a schema that is used for serialization. A schema describes the format of the data that can be serialized and is essential to the serialization. In practice, each schema stores a vector of FORMAT types, each of which describes a single attribute that can be serialized. A schema can be extended by adding more FORMATs to the vector, but previously added FORMATs can never be removed to ensure that any data serialized with a previous version of a schema can still be deserialized with the new version.
FORMAT is a struct with a name and type value. The FORMAT names need to be unique within a given schema.
struct FORMAT {
std::string name;
std::string type;
};
Valid types are:
int8/ int16/ int32/ int64
uint8/ uint16/ uint32/ uint64
fixed8/ fixed16/ fixed32/ fixed64
float/ double/ string/ ipfs/ bool/ byte
or any valid type followed by [] to describe a vector.
nested vectors (e.g. uint64[][]) are not allowed
Also, the FORMAT {"name": "name", "type": "string"}
needs to be present in every schema.
Just like with Protobuf, to understand the AtomicAssets serialization it is important to first understand Varints (Variable size integers). Check out the Protobuf docs for that here.
The data to be serialized is passed to the smart contract as an ATTRIBUTE_MAP, which maps attribute names to their values.
vector<uint8_t> main(vector<FORMAT> format_lines, ATTRIBUTE_MAP data) {
serialized_data = empty uint8_t vector
//0-3 are reserved for possible later extensions
identifier = 4
For each line in format_lines {
If line.name is defined in data {
Append varint(identifier) to serialized_data
linedata = data[line.name]
Append serialize(linedata, line.type)
}
identifier += 1
}
return serialized_data
}
Data is serialized in the order of the respective FORMATs within a schemas format. Attributes that are not defined within the provided ATTRIBUTE_MAP is skipped completely and does not take up any space.
Ahead of a serialized attribute, there is a varint encoded identifier. This identifier is dependent on the position of the attribute within the format vector. Because the identifiers 0-3 are reserved, the first attribute has identifier 4.
Integers are first zig-zag encoded and then stored as varints.
Unsigned Integers are stored as varints.
The fixed type is an alias for uint, but not stored as varints but instead as a fixed size in little endian order.
Floats and Doubles are stored as their 4/ 8 byte raw representation.
Strings are treated as if they were character vectors. Therefore, at first the varint encoded length of the string is stored, followed by the characters.
The IPFS type is passed as a Base58 encoded string to the contract. It is then decoded to a byte vector, which like all vectors is serialized by first storing the varint encoded length of the vector, followed by the bytes.
Bools are stored as a single byte with the value 1 if the bool is true and 0 if it is false.
byte is an alias for fixed8.
Vectors are serialized by first storing the length of the vector (varint encoded) and then appending the serialized version of each of their elements.
[
{
name: "id",
type: "uint64"
},
{
name: "name",
type: "string"
},
{
name: "children",
type: "uint64[]"
}
]
We want to serialize the following data:
{
id: 300,
name: "Tom"
}
We now loop through each of the 3 format lines:
serialized data: []
- Is
id
defined in the data? Yes
1.a) Append the varint attribute identifier =04
->serialized data: [04]
1.b) Append the serialized uint64
1.b.i) 300 --(varint)-->[AC, 02]
->serialized data: [04 AC 02]
- Is
name
defined in the data? Yes
2.a) Append the varint attribute identifier =05
->serialized data: [04 AC 02 05]
2.b) Append the serialized string
2.b.i) length = 3 --(varint)-->[03]
->serialized data: [04 AC 02 05 03]
2.b.i) "Tom" =54 6F 6D
->serialized data: [04 AC 02 05 03 54 6F 6D]
- Is
children
defined in the data? No
serialized data: [04 AC 02 05 03 54 6F 6D]
We want to deserialize the following byte vector:
[05 04 50 61 75 6C 06 02 64 E8 07]
-
1.a) Read a varint identifier
1.a.i)05 = b 0000 0101
-> highest bit is not set -> varint is complete = 5
1.b) Identifier 5 means attribute number 1 (0 based) -> we're reading a string
1.b.i) Read a varint string length
1.b.i.1)04 = b 0000 0100
-> highest bit is not set -> varint is complete = 4
1.b.ii) Read 4 chars =50 61 75 6C
= "Paul"
1.c) Attribute "name" is set with value "Paul" -
2.a) Read a varint identifier
2.a.i)06 = b 0000 0110
-> highest bit is not set -> varint is complete = 6
2.b) Identifier 6 means attribute number 2 (0 based) -> we're reading a uint64 vector
2.b.i) Read a varint vector length
1.b.i.1)02 = b 0000 0010
-> highest bit is not set -> varint is complete = 2
1.b.ii) Read 2 uint64 values
1.b.ii.1)64 = b 0110 0100
-> highest bit is not set -> varint is complete = 100
1.b.ii.2)E8 = b 1110 1000
-> highest bit is set -> varint is not complete
1.b.ii.2)07 = b 0000 0111
-> highest bit is not set -> varint is complete
1.b.ii.2) combined value =b 000 0111 ++ b 110 1000
=b 0011 1110 1000
= 1000
2.c) Attribute "children" is set with value [100, 1000] - Reached the end of the byte vector, deserialization complete
atomicassets.io - developed with ❤️ by pink.network