Serialization

Data Serialization

The custom data serialization is one of the core features of AtomicAssets. It is inspired by Google's Protobuf and is saved as a byte vector (vector<uint8_t>) to the blockchain.
It is expected to save between 30-80% of RAM for the majority of asset collections compared to traditional methods like using JSON strings.

Schemas

Each template and each asset references a schema that is used for serialization. A schema describes the format of the data that can be serialized and is essential to the serialization. In practice, each schema stores a vector of FORMAT types, each of which describes a single attribute that can be serialized. A schema can be extended by adding more FORMATs to the vector, but previously added FORMATs can never be removed to ensure that any data serialized with a previous version of a schema can still be deserialized with the new version.

FORMAT is a struct with a name and type value. The FORMAT names need to be unique within a given schema.

struct FORMAT {
  std::string name;
  std::string type;
};

Valid types are:

int8/ int16/ int32/ int64
uint8/ uint16/ uint32/ uint64
fixed8/ fixed16/ fixed32/ fixed64
float/ double/ string/ ipfs/ bool/ byte

or any valid type followed by [] to describe a vector.
nested vectors (e.g. uint64[][]) are not allowed

Also, the FORMAT {"name": "name", "type": "string"} needs to be present in every schema.

How does the serialization work

Prerequisites

Just like with Protobuf, to understand the AtomicAssets serialization it is important to first understand Varints (Variable size integers). Check out the Protobuf docs for that here.

The data to be serialized is passed to the smart contract as an ATTRIBUTE_MAP, which maps attribute names to their values.

Pseudo Algorithm

vector<uint8_t> main(vector<FORMAT> format_lines, ATTRIBUTE_MAP data) {
   
   serialized_data = empty uint8_t vector
   //0-3 are reserved for possible later extensions
   identifier = 4
   
   For each line in format_lines {
      If line.name is defined in data {
         Append varint(identifier) to serialized_data
         linedata = data[line.name]
         Append serialize(linedata, line.type)
      }
      identifier += 1
   }
   
   return serialized_data
}

Explanation:

Data is serialized in the order of the respective FORMATs within a schemas format. Attributes that are not defined within the provided ATTRIBUTE_MAP is skipped completely and does not take up any space.
Ahead of a serialized attribute, there is a varint encoded identifier. This identifier is dependent on the position of the attribute within the format vector. Because the identifiers 0-3 are reserved, the first attribute has identifier 4.

Serialization of specific types

int8/ int16/ int32/ int64

Integers are first zig-zag encoded and then stored as varints.

uint8/ uint16/ uint32/ uint64

Unsigned Integers are stored as varints.

fixed8/ fixed16/ fixed32/ fixed64

The fixed type is an alias for uint, but not stored as varints but instead as a fixed size in little endian order.

float/ double

Floats and Doubles are stored as their 4/ 8 byte raw representation.

strings

Strings are treated as if they were character vectors. Therefore, at first the varint encoded length of the string is stored, followed by the characters.

ipfs

The IPFS type is passed as a Base58 encoded string to the contract. It is then decoded to a byte vector, which like all vectors is serialized by first storing the varint encoded length of the vector, followed by the bytes.

bool

Bools are stored as a single byte with the value 1 if the bool is true and 0 if it is false.

byte

byte is an alias for fixed8.

Serialization of Vectors

Vectors are serialized by first storing the length of the vector (varint encoded) and then appending the serialized version of each of their elements.

An Example

Example Format

[
  {
    name: "id",
    type: "uint64"
  },
  {
    name: "name",
    type: "string"
  },
  {
    name: "children",
    type: "uint64[]"
  }
]

Serialization

We want to serialize the following data:

{
  id: 300,
  name: "Tom"
}

We now loop through each of the 3 format lines:

serialized data: []

Is id defined in the data? Yes
1.a) Append the varint attribute identifier = 04 -> serialized data: [04]
1.b) Append the serialized uint64
1.b.i) 300 --(varint)--> [AC, 02] -> serialized data: [04 AC 02]
Is name defined in the data? Yes
2.a) Append the varint attribute identifier = 05 -> serialized data: [04 AC 02 05]
2.b) Append the serialized string
2.b.i) length = 3 --(varint)--> [03] -> serialized data: [04 AC 02 05 03]
2.b.i) "Tom" = 54 6F 6D -> serialized data: [04 AC 02 05 03 54 6F 6D]
Is children defined in the data? No

serialized data: [04 AC 02 05 03 54 6F 6D]

Deserialization

We want to deserialize the following byte vector:

[05 04 50 61 75 6C 06 02 64 E8 07]

1.a) Read a varint identifier
1.a.i) 05 = b 0000 0101 -> highest bit is not set -> varint is complete = 5
1.b) Identifier 5 means attribute number 1 (0 based) -> we're reading a string
1.b.i) Read a varint string length
1.b.i.1) 04 = b 0000 0100 -> highest bit is not set -> varint is complete = 4
1.b.ii) Read 4 chars = 50 61 75 6C = "Paul"
1.c) Attribute "name" is set with value "Paul"
2.a) Read a varint identifier
2.a.i) 06 = b 0000 0110 -> highest bit is not set -> varint is complete = 6
2.b) Identifier 6 means attribute number 2 (0 based) -> we're reading a uint64 vector
2.b.i) Read a varint vector length
1.b.i.1) 02 = b 0000 0010 -> highest bit is not set -> varint is complete = 2
1.b.ii) Read 2 uint64 values
1.b.ii.1) 64 = b 0110 0100 -> highest bit is not set -> varint is complete = 100
1.b.ii.2) E8 = b 1110 1000 -> highest bit is set -> varint is not complete
1.b.ii.2) 07 = b 0000 0111 -> highest bit is not set -> varint is complete
1.b.ii.2) combined value = b 000 0111 ++ b 110 1000 = b 0011 1110 1000 = 1000
2.c) Attribute "children" is set with value [100, 1000]
Reached the end of the byte vector, deserialization complete