Skip to content

Page layout

Volodkin Vladislav edited this page Jan 11, 2023 · 3 revisions

Diagram

Slightly outdated visualisation is available here.

Adjustments to initial layout

Page holds a b+tree node. There are two types of nodes:

  1. internal (hold links to other nodes, number of links is N+1, where N is the number of keys)
  2. leafs (hold values, number of values is exactly the same as the number of keys)

Both nodes hold keys, for internal nodes this is the main binary payload, copying of which we avoid by using indirection in form of offsets. It is convenient to have different layouts for internal and leaf nodes:

  1. internal nodes hold links separately from the keys, corresponding array of BigEndian-encoded uint32's is located in the page right after the offsets array (length of links array greater by 1 compared to offsets array)
  2. leaf nodes hold values close to the keys, in cells which offsets point to

This design is a good fit, as it allows us to reorder links separately from the keys. The downside here is that we have to rewrite all of the links after, say, inserting one in the middle. But since links have a fixed size this is acceptable for the sake of simplicity.

Disk writes

Modifications of the page can take the following forms:

  1. insertion of a key (and a link) to an internal page - 2 writes:
    • to persist header, offsets and links (around 1221 bytes for page with 100 keys)
    • to write a key (arbitrary amount of bytes)
  2. insertion of a key (and value) to a leaf page - also 2 writes;
  3. adding a brand new page after a split - 1 write of the whole page

Possible optimisations

  1. we can buffer added keys, and flush them in one write as they most likely will take consecutive offsets

Other

In the first implementation we won't have overflow pages which limits the key/value pair size, so it can be guaranteed that page will always have place for maximum of K keys, which is defined by b+tree configuration. This limit should be eliminated.

Clone this wiki locally