High performance encoding and decoding of BSON data.
- Allocation free API for reading and writing BSON data.
- Natural mapping of Julia types to corresponding BSON types.
- Convenience API to read and write
Dict{String, Any}
orOrderedDict{String, Any}
(default, for roundtrip consistency) as BSON. - Struct API tunable for tradeoffs between flexibility, performance, and evolution.
- Configurable validation levels.
- Light weight indexing for larger documents.
- Transducers.jl compatible.
- Tested for conformance against the BSON corpus.
- Generic serialization of all Julia types to BSON. See BSON.jl for that functionality. LightBSON.jl aims for natural representations, suitable for interop with other languages and long term persistence.
- Integrated with FileIO.jl. BSON.jl already is, and adding another with different semantics would be confusing.
- A BSON mutation API. Reading and writing are entirely separate and only complete documents can be written.
- Conversion to and from Extended JSON. This may be added later.
Performance sensitive use cases are the focus of this package, but a high level API is available for where performance is less of a concern. The high level API will also perform optimally when reading simple types without strings, arrays, or other fields that require allocation.
bson_read([T=Any], src)
readsT
fromsrc
wheresrc
is either aDenseVector{UInt8}
, anIO
object, or a path to a file.bson_write(dst, x)
writesx
todst
wherex
is a type that maps to fields (dictionary, struct, named tuple), anddst
is aDenseVector{UInt8}
, anIO
, or a path to a file.bson_write(dst, xs...)
writesxs
todst
wherexs
is a list ofPair{String, T}
mapping names to field values, anddst
is as above.
# Write to a byte buffer and return the buffer
buf = bson_write(UInt8[], :x => Int64(1), :y => Int64(2))
# Read from a byte buffer as an OrderedDict{String, Any}
bson_read(buf)
# Read from a byte buffer as a specific type
x = bson_read(@NamedTuple{x::Int64, y::Int64}, buf)
path = joinpath(homedir(), "test.bson")
# Write to a file
bson_write(path, x)
# Read from a file as an OrderedDict{String Any}
bson_read(path)
- Documents are read and written to and from byte arrays with BSONReader and BSONWriter.
- BSONReader and BSONWriter are immutable struct types with no state. They can be instantiated without allocation.
- BSONWriter will append to the destination array. User is responsible for not writing duplicate fields.
reader["foo"]
orreader[:foo]
findsfoo
and returns a new reader pointing to the field.reader[T]
materializes a field to the typeT
.writer["foo"] = x
orwriter[:foo] = x
appends a field with namefoo
and valuex
.close(writer)
finalizes a document (writes the document length and terminating null byte).- Prefer strings for field names. Symbols in Julia unfortunately do not have a constant time length API.
buf = UInt8[]
writer = BSONWriter(buf)
writer["x"] = Int64(123)
close(writer)
reader = BSONReader(buf)
reader["x"][Int64] # 123
Nested documents are written by assigning a field with a function that takes a nested writer. Nested fields are read by index.
buf = UInt8[]
writer = BSONWriter(buf)
writer["x"] = nested_writer -> begin
nested_writer["a"] = 1
nested_writer["b"] = 2
end
close(writer)
reader = BSONReader(buf)
reader["x"]["a"][Int64] # 1
reader["x"]["b"][Int64] # 2
Arrays are written by assigning array values, or by generators. Arrays can be materialized to Julia arrays, or be accessed by index.
buf = UInt8[]
writer = BSONWriter(buf)
writer["x"] = Int64[1, 2, 3]
writer["y"] = (Int64(x * x) for x in 1:3)
close(writer)
reader = BSONReader(buf)
reader["x"][Vector{Int64}] # [1, 2, 3]
reader["y"][Vector{Int64}] # [1, 4, 9]
reader["x"][2][Int64] # 2
reader["y"][2][Int64] # 4
Where performance is not a concern, elements can be materialized to dictionaries (recursively) or to the most appropriate Julia type for leaf fields. Dictionaries can also be written directly.
buf = UInt8[]
writer = BSONWriter(buf)
writer[] = OrderedDict{String, Any}("x" => Int64(1), "y" => "foo")
close(writer)
reader = BSONReader(buf)
reader[Dict{String, Any}] # Dict{String, Any}("x" => 1, "y" => "foo")
reader[OrderedDict{String, Any}] # OrderedDict{String, Any}("x" => 1, "y" => "foo")
reader[Any] # OrderedDict{String, Any}("x" => 1, "y" => "foo")
reader["x"][Any] # 1
reader["y"][Any] # "foo"
Generators can be used to directly write nested documents in arrays.
buf = UInt8[]
writer = BSONWriter(buf)
writer["x"] = (
nested_writer -> begin
nested_writer["a"] = Int64(x)
nested_writer["b"] = Int64(x * x)
end
for x in 1:3
)
close(writer)
reader = BSONReader(buf)
reader["x"][Vector{Any}] # Any[OrderedDict{String, Any}("a" => 1, "b" => 1), OrderedDict{String, Any}("a" => 2, "b" => 4), OrderedDict{String, Any}("a" => 3, "b" => 9)]
reader["x"][3]["b"][Int64] # 9
The abstract types Number
, Integer
, and AbstractFloat
can be used to materialize numeric fields to the most appropriate Julia type under the abstract type.
buf = UInt8[]
writer = BSONWriter(buf)
writer["x"] = Int32(123)
writer["y"] = 1.25
close(writer)
reader = BSONReader(buf)
reader["x"][Number] # 123
reader["x"][Integer] # 123
reader["y"][Number] # 1.25
reader["y"][AbstractFloat] # 1.25
String fields can be materialized as WeakRefString{UInt8}
, aliased as UnsafeBSONString
. This will create a string object with pointers to the underlying buffer without performing any allocations. User must take care of GC safety.
buf = UInt8[]
writer = BSONWriter(buf)
writer["x"] = "foo"
close(writer)
reader = BSONReader(buf)
s = reader["x"][UnsafeBSONString]
GC.@preserve buf s == "foo" # true
Binary fields are represented with BSONBinary
.
buf = UInt8[]
writer = BSONWriter(buf)
writer["x"] = BSONBinary([0x1, 0x2, 0x3], BSON_SUBTYPE_GENERIC_BINARY)
close(writer)
reader = BSONReader(buf)
x = reader["x"][BSONBinary]
x.data # [0x1, 0x2, 0x3]
x.subtype # BSON_SUBTYPE_GENERIC_BINARY
Binary fields can be materialized as UnsafeBSONBinary
for zero alocations, where the data
field is an UnsafeArray{UInt8, 1}
with pointers into the underlying buffer. User must take care of GC safety.
buf = UInt8[]
writer = BSONWriter(buf)
writer["x"] = BSONBinary([0x1, 0x2, 0x3], BSON_SUBTYPE_GENERIC_BINARY)
close(writer)
reader = BSONReader(buf)
x = reader["x"][UnsafeBSONBinary]
GC.@preserve buf x.data == [0x1, 0x2, 0x3] # true
x.subtype # BSON_SUBTYPE_GENERIC_BINARY
Fields can be iterated with foreach()
or using the Transducers.jl APIs. Fields are represented as Pair{UnsafeBSONString, BSONReader}
.
buf = UInt8[]
writer = BSONWriter(buf)
writer["x"] = Int64(1)
writer["y"] = Int64(2)
writer["z"] = Int64(3)
close(writer)
reader = BSONReader(buf)
reader |> Map(x -> x.second[Int64]) |> sum # 6
foreach(x -> println(x.second[Int64]), reader) # 1\n2\n3\n
BSON field access involves a linear scan to find the matching field name. Depending on the size and structure of a document, and the fields being accessed, it migh be preferable to build an index over the fields first, to be re-used on every access.
BSONIndex provides a very light weight partial (collisions evict previous entries) index over a document. It is designed to be re-used from document to document, by means of a constant time reset. IndexedBSONReader wraps a reader and an index to accelerate field access in a document. Index misses fall back to the wrapped reader.
buf = UInt8[]
writer = BSONWriter(buf)
writer["x"] = Int64(1)
writer["y"] = Int64(2)
writer["z"] = Int64(3)
close(writer)
index = BSONIndex(128)
# Index is built when IndexBSONReader is constructed
reader = IndexedBSONReader(index, BSONReader(buf))
reader["z"][Int64] # 3 -- accessed by index
empty!(buf)
writer = BSONWriter(buf)
writer["a"] = Int64(1)
writer["b"] = Int64(2)
writer["c"] = Int64(3)
close(writer)
# Index can be re-used
reader = IndexedBSONReader(index, BSONReader(buf))
reader["b"][Int64] # 2 -- accessed by index
reader["x"] # throws KeyError
BSONReader can be configured with a validator to use during the processing of the input document; 3 are provided:
- StrictBSONValidator - Validates all error cases presented in the BSON corpus.
- LightBSONValidator - Validates field lengths against parent scope (document or buffer), to guard against invalid memory access. This is the default validator.
- UncheckedBSONValidator - Performs no validation.
BSONReader(buf, StrictBSONValidator()) # Reader with strict validation
BSONReader(buf, LightBSONValidator()) # Reader with memory protected validation
BSONReader(buf, UncheckedBSONValidator()) # Reader with no validation
Structs can be automatically translated to and from BSON, provided all their fields can be represented in BSON. Traits functions are used to select the mode of conversion, and these serve as an extension point for user defined types.
bson_simple(T)::Bool
- Set this to true if fields to be serialized are given byfieldnames(T)
andT
can be constructed by fields in order of declaration. Defaults toStructTypes.StructType(T) == StructTypes.NoStructType()
.
Provided bson_simple(T)
is false, serialization will use the StructTypes.jl API to iterate fields of T
and to construct T
. See StructTypes.jl for more details.
For simple types using the bson_simple(T)
trait will generate faster serialization code. This needs only be explicitly set if StructTypes.StructType(T)
is also defined.
struct NestedSimple
a::Int64
b::Float64
end
struct Simple
x::String
y::NestedSimple
end
StructTypes.StructType(::Type{Simple}) = StructTypes.Struct()
LightBSON.bson_simple(::Type{Simple}) = true
buf = UInt8[]
writer = BSONWriter(buf)
writer["simple"] = Simple("foo", NestedSimple(123, 1.25))
close(writer)
reader = BSONReader(buf)
reader["simple"][Simple] # Simple("foo", NestedSimple(123, 1.25))
# Structs can also be written to the root of the document
buf = UInt8[]
writer = BSONWriter(buf)
writer[] = Simple("foo", NestedSimple(123, 1.25))
close(writer)
reader = BSONReader(buf)
reader[Simple] # Simple("foo", NestedSimple(123, 1.25))
For long term persistence or long lived APIs, it may be advisable to encode information about schema versions in documents, and implement ways to evolve schemas through time. Specific strategies for schema evolution are beyond the scope of this package to advise or impose, rather extension points are provided for users to implement the mechanisms best fit to their use cases.
bson_schema_version(T)
- The current schema version forT
, can be any BSON compatible type, ornothing
ifT
is unversioned. Defaults tonothing
.bson_schema_version_field(T)
- The field name to use in the BSON document for storing the schema version. Defaults to"_v"
.bson_read_versioned(T, v, reader)
- Handle versionv
with respect to current version ofT
and readT
fromreader
. Defaults to error if the schema versions are mismatched, and otherwise read as-if unversioned.
struct Evolving1
x::Int64
end
LightBSON.bson_schema_version(::Type{Evolving1}) = Int32(1)
struct Evolving2
x::Int64
y::Float64
end
# Construct from old version, defaulting new fields
Evolving2(old::Evolving1) = Evolving2(old.x, NaN)
LightBSON.bson_schema_version(::Type{Evolving2}) = Int32(2)
function LightBSON.bson_read_versioned(::Type{Evolving2}, v::Int32, reader::AbstractBSONReader)
if v == 1
Evolving2(bson_read_unversioned(Evolving1, reader))
elseif v == 2
bson_read_unversioned(Evolving2, reader)
else
# Real world application may instead want a mechanism to allow forward compatibility,
# e.g., by encoding breaking vs non-breaking change info in the version
error("Unsupported schema version $v for Evolving")
end
end
const Evolving = Evolving2
buf = UInt8[]
writer = BSONWriter(buf)
# Write old version
writer[] = Evolving1(123)
close(writer)
reader = BSONReader(buf)
# Read as new version
reader[Evolving] # Evolving2(123, NaN)
Named tuples can be read and written like any struct type.
buf = UInt8[]
writer = BSONWriter(buf)
writer[] = (; x = "foo", y = 1.25, z = (; a = Int64(123), b = Int64(456)))
close(writer)
reader = BSONReader(buf)
reader[@NamedTuple{x::String, y::Float64, z::@NamedTuple{a::Int64, b::Int64}}] # (x = "foo", y = 1.25, z = (a = 123, b = 456))
Since BSONWriter itself is immutable, it makes frequent calls to resize the underlying array to track the write head position. Unfortunately at present, this is not a well optimized operation in Julia, resolving to C-calls for manipulating the state of the array.
BSONWriteBuffer wraps a Vector{UInt8}
to track size purely in Julia and avoid most of these calls. It implements the minimum API necessary for use with BSONReader and BSONWriter, and is not for use as a general Array
implementation.
buf = BSONWriteBuffer()
writer = BSONWriter(buf)
writer["x"] = Int64(123)
close(writer)
reader = BSONReader(buf)
reader["x"][Int64] # 123
buf.data # Underlying array, may be longer than length(buf)
Performance naturally will depend very much on the nature of data being processed. The main overarching goal with this package is to enable the highest possible performance where the user requires it and is willing to sacrifice some convenience to achieve their target.
General advice for high performance BSON schema, such as short field names, avoiding long arrays or documents, and using nesting to reduce search complexity, all apply.
For LightBSON.jl specifically, prefer strings over symbols for field names, use unsafe variants rather than allocating strings and buffers where possible, reuse buffers and indexes, use BSONWriteBuffer rather than plain Vector{UInt8}
, and enable bson_simple(T)
for all applicable types.
Here's an example benchmark, reading and writing a named tuple with nesting (run on a Ryzen 5950X).
x = (;
f1 = 1.25,
f2 = Int64(123),
f3 = now(UTC),
f4 = true,
f5 = Int32(456),
f6 = d128"1.2",
f7 = (; x = uuid4(), y = 2.5)
)
@btime begin
buf = $(UInt8[])
empty!(buf)
writer = BSONWriter(buf)
writer[] = $x
close(writer)
end # 78.730 ns (0 allocations: 0 bytes)
@btime begin
buf = $(BSONWriteBuffer())
empty!(buf)
writer = BSONWriter(buf)
writer[] = $x
close(writer)
end # 58.303 ns (0 allocations: 0 bytes)
buf = UInt8[]
writer = BSONWriter(buf)
writer[] = x
close(writer)
@btime BSONReader($buf)[$(typeof(x))] # 103.825 ns (0 allocations: 0 bytes)
@btime BSONReader($buf, UncheckedBSONValidator())[$(typeof(x))] # 102.549 ns (0 allocations: 0 bytes)
@btime IndexedBSONReader($(BSONIndex(128)), BSONReader($buf))[$(typeof(x))] # 86.876 ns (0 allocations: 0 bytes)
@btime IndexedBSONReader($(BSONIndex(128)), BSONReader($buf, UncheckedBSONValidator()))[$(typeof(x))] # 84.987 ns (0 allocations: 0 bytes)
@btime IndexedBSONReader($(BSONIndex(128)), BSONReader($buf)) # 47.896 ns (0 allocations: 0 bytes)
@btime $(IndexedBSONReader(BSONIndex(128), BSONReader(buf)))[$(typeof(x))] # 41.434 ns (0 allocations: 0 bytes)
We can observe BSONWriteBuffer makes a material difference to write performance, while indexing, in this case, has a reasonable effect on read performance even for this small document. In the final two lines we can see that indexing and reading the indexed document breaks down roughly half/half. Using the unchecked validator has a smaller impact, and must be used with caution.
BSONObjectId implements the BSON ObjectId type in accordance with the spec. New IDs for the process can be generated by construction, or using bson_object_id_range(n)
for more efficient batch generation.
id1 = BSONObjectId()
DateTime(id1) # The UTC timestamp for when id1 was generated, truncated to seconds
id2 = BSONObjectId()
id1 != id2 # true, IDs are unique
collect(bson_object_id_range(5)) # 5 IDs with equal timestamp and different sequence field
- BSON.jl - Generic serialization of all Julia types to and from BSON.
- Mongoc.jl - Julia MongoDB client.
- DecFP.jl - Provides the
Dec128
type used for BSONdecimal128
fields. - Transducers.jl - Data iteration and transformation API.
- WeakRefStrings.jl - Pointer based strings.
- UnsafeArrays.jl - Pointer based arrays.
- StructTypes.jl - Serialization traits and utilities for user defined structures.