sidebar_position | sidebar_label |
---|---|
5 |
ZJSON |
The super data model is based on richly typed records with a deterministic field order, as is implemented by the Super JSON, Super Binary, and Super Columnar formats. Given the ubiquity of JSON, it is desirable to also be able to serialize super data into the JSON format. However, encoding super data values directly as JSON values would not work without loss of information.
For example, consider this Super JSON data:
{
ts: 2018-03-24T17:15:21.926018012Z,
a: "hello, world",
b: {
x: 4611686018427387904,
y: 127.0.0.1
}
}
A straightforward translation to JSON might look like this:
{
"ts": 1521911721.926018012,
"a": "hello, world",
"b": {
"x": 4611686018427387904,
"y": "127.0.0.1"
}
}
But, when this JSON is transmitted to a JavaScript client and parsed, the result looks something like this:
{
"ts": 1521911721.926018,
"a": "hello, world",
"b": {
"x": 4611686018427388000,
"y": "127.0.0.1"
}
}
The good news is the a
field came through just fine, but there are
a few problems with the remaining fields:
- the timestamp lost precision (due to 53 bits of mantissa in a JavaScript IEEE 754 floating point number) and was converted from a time type to a number,
- the int64 lost precision for the same reason, and
- the IP address has been converted to a string.
As a comparison, Python's json
module handles the 64-bit integer to full
precision, but loses precision on the floating point timestamp.
Also, it is at the whim of a JSON implementation whether
or not the order of object keys is preserved.
While JSON is well suited for data exchange of generic information, it is not sufficient for the super-structured data model. That said, JSON can be used as an encoding format for super data with another layer of encoding on top of a JSON-based protocol. This allows clients like web apps or Electron apps to receive and understand Super JSON and, with the help of client libraries like zed-js, to manipulate the rich, structured Super JSON types that are implemented on top of the basic JavaScript types.
In other words, because JSON objects do not have a deterministic field order nor does JSON in general have typing beyond the basics (i.e., strings, floating point numbers, objects, arrays, and booleans), Super JSON and its embedded type model is layered on top of regular JSON.
The format for representing Super JSON data in JSON is called ZJSON. Converting Super JSON, Super Binary, or Super Columnar to ZJSON and back results in a complete and accurate restoration of the original super data.
A ZJSON stream is defined as a sequence of JSON objects where each object represents a value and has the form:
{
"type": <type>,
"value": <value>
}
The type and value fields are encoded as defined below.
The type encoding for a primitive type is simply its type name e.g., "int32" or "string".
Complex types are encoded with small-integer identifiers. The first instance of a unique type defines the binding between the integer identifier and its definition, where the definition may recursively refer to earlier complex types by their identifiers.
For example, the type {s:string,x:int32}
has this ZJSON format:
{
"id": 123,
"kind": "record",
"fields": [
{
"name": "s",
"type": {
"kind": "primitive",
"name": "string"
}
},
{
"name": "x",
"type": {
"kind": "primitive",
"name": "int32"
}
}
]
}
A previously defined complex type may be referred to using a reference of the form:
{
"kind": "ref",
"id": 123
}
A record type is a JSON object of the form
{
"id": <number>,
"kind": "record",
"fields": [ <field>, <field>, ... ]
}
where each of the fields has the form
{
"name": <name>,
"type": <type>,
}
and <name>
is a string defining the field name and <type>
is a
recursively encoded type.
An array type is defined by a JSON object having the form
{
"id": <number>,
"kind": "array",
"type": <type>
}
where <type>
is a recursively encoded type.
A set type is defined by a JSON object having the form
{
"id": <number>,
"kind": "set",
"type": <type>
}
where <type>
is a recursively encoded type.
A map type is defined by a JSON object of the form
{
"id": <number>,
"kind": "map",
"key_type": <type>,
"val_type": <type>
}
where each <type>
is a recursively encoded type.
A union type is defined by a JSON object having the form
{
"id": <number>,
"kind": "union",
"types": [ <type>, <type>, ... ]
}
where the list of types comprise the types of the union and
and each <type>
is a recursively encoded type.
An enum type is a JSON object of the form
{
"id": <number>,
"kind": "enum",
"symbols": [ <string>, <string>, ... ]
}
where the unique <string>
values define a finite set of symbols.
An error type is a JSON object of the form
{
"id": <number>,
"kind": "error",
"type": <type>
}
where <type>
is a recursively encoded type.
A named type is encoded as a binding between a name and a type and represents a new type so named. A type definition type has the form
{
"id": <number>,
"kind": "named",
"name": <id>,
"type": <type>,
}
where <id>
is a JSON string representing the newly defined type name
and <type>
is a recursively encoded type.
The primitive values comprising an arbitrarily complex data value are encoded as a JSON array of strings mixed with nested JSON arrays whose structure conforms to the nested structure of the value's schema as follows:
- each record, array, and set is encoded as a JSON array of its composite values,
- a union is encoded as a string of the form
<tag>:<value>
wheretag
is an integer string representing the positional index in the union's list of types that specifies the type of<value>
, which is a JSON string or array as described recursively herein, - a map is encoded as a JSON array of two-element arrays of the form
[ <key>, <value> ]
wherekey
andvalue
are recursively encoded, - a type value is encoded as above,
- each primitive that is not a type value is encoded as a string conforming to its Super JSON representation, as described in the corresponding section of the Super JSON specification.
For example, a record with three fields --- a string, an array of integers, and an array of union of string, and float64 --- might have a value that looks like this:
[ "hello, world", ["1","2","3","4"], ["1:foo", "0:10" ] ]
A ZJSON file is composed of ZJSON objects formatted as newline delimited JSON (NDJSON). e.g., the super CLI command writes its ZJSON output as lines of NDJSON.
Here is an example that illustrates values of a repeated type,
nesting, records, array, and union. Consider the file input.jsup
:
{s:"hello",r:{a:1,b:2}}
{s:"world",r:{a:3,b:4}}
{s:"hello",r:{a:[1,2,3]}}
{s:"goodnight",r:{x:{u:"foo"((string,int64))}}}
{s:"gracie",r:{x:{u:12((string,int64))}}}
This data is represented in ZJSON as follows:
super -f zjson input.jsup | jq .
{
"type": {
"kind": "record",
"id": 31,
"fields": [
{
"name": "s",
"type": {
"kind": "primitive",
"name": "string"
}
},
{
"name": "r",
"type": {
"kind": "record",
"id": 30,
"fields": [
{
"name": "a",
"type": {
"kind": "primitive",
"name": "int64"
}
},
{
"name": "b",
"type": {
"kind": "primitive",
"name": "int64"
}
}
]
}
}
]
},
"value": [
"hello",
[
"1",
"2"
]
]
}
{
"type": {
"kind": "ref",
"id": 31
},
"value": [
"world",
[
"3",
"4"
]
]
}
{
"type": {
"kind": "record",
"id": 34,
"fields": [
{
"name": "s",
"type": {
"kind": "primitive",
"name": "string"
}
},
{
"name": "r",
"type": {
"kind": "record",
"id": 33,
"fields": [
{
"name": "a",
"type": {
"kind": "array",
"id": 32,
"type": {
"kind": "primitive",
"name": "int64"
}
}
}
]
}
}
]
},
"value": [
"hello",
[
[
"1",
"2",
"3"
]
]
]
}
{
"type": {
"kind": "record",
"id": 38,
"fields": [
{
"name": "s",
"type": {
"kind": "primitive",
"name": "string"
}
},
{
"name": "r",
"type": {
"kind": "record",
"id": 37,
"fields": [
{
"name": "x",
"type": {
"kind": "record",
"id": 36,
"fields": [
{
"name": "u",
"type": {
"kind": "union",
"id": 35,
"types": [
{
"kind": "primitive",
"name": "int64"
},
{
"kind": "primitive",
"name": "string"
}
]
}
}
]
}
}
]
}
}
]
},
"value": [
"goodnight",
[
[
[
"1",
"foo"
]
]
]
]
}
{
"type": {
"kind": "ref",
"id": 38
},
"value": [
"gracie",
[
[
[
"0",
"12"
]
]
]
]
}