Skip to content
This repository has been archived by the owner on Oct 18, 2023. It is now read-only.

Deduplicate wallog #448

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

Deduplicate wallog #448

wants to merge 1 commit into from

Conversation

psarna
Copy link
Contributor

@psarna psarna commented May 31, 2023

!!! early draft, full of debug prints, barely works !!!

This draft contains experiments around deduplicating our own wallog format with libSQL WAL. The potential win here is reducing write and space amplification from 2x to around 1.08x.

The main idea is as follows: wallog is only used to store frame metadata, and frame data is only stored either in the main database file, or in WAL. That's very simple to implement in a single-node system, but it gets complicated with replicas, because a replica is allowed to ask the primary for any arbitrary wallog frame.

The rough idea for dealing with replicas is to:

  1. Make sure that we control checkpoints. autocheckpoint is off, and we only issue a checkpoint operation on the primary ourselves, explicitly, and periodically.
  2. All streaming of frames to replicas must finish before we issue a checkpoint operation.
  3. We only checkpoint in TRUNCATE mode, i.e. a write lock is taken and the whole WAL log is rewritten to the main db file. That simplifies lots of edge (sic!) cases.
  4. Once we checkpoint, we drop the previous wallog, and instead only store the following information. Let's assume that the main db file has N pages. Pages 1..N are now available as frames X..X+N in the wallog, and X is the oldest frame a replica should ever ask for -> anything before X is out-of-date anyway. If any replica asks for an earlier page, it gets an error message saying "please drop whatever you're doing and start asking for frames X or greater instead.

@psarna psarna force-pushed the dedup_ branch 3 times, most recently from b40a14b to 8bececd Compare June 1, 2023 11:43
**!!! early draft, full of debug prints, barely works !!!**

This draft contains experiments around deduplicating our own `wallog` format
with libSQL WAL. The potential win here is reducing write and space
amplification from 2x to around 1.08x.

The main idea is as follows: `wallog` is only used to store frame metadata, and
frame data is only stored either in the main database file, or in WAL. That's
very simple to implement in a single-node system, but it gets complicated with
replicas, because a replica is allowed to ask the primary for any arbitrary
wallog frame.

The rough idea for dealing with replicas is to:
1. Make sure that we control checkpoints. autocheckpoint is off, and we only
issue a checkpoint operation on the primary ourselves, explicitly, and
periodically.
2. All streaming of frames to replicas must finish before we issue a checkpoint
operation.
3. We only checkpoint in TRUNCATE mode, i.e. a write lock is taken and the
whole WAL log is rewritten to the main db file. That simplifies lots of edge
(sic!) cases.
4. Once we checkpoint, we drop the previous `wallog`, and instead only store
the following information. Let's assume that the main db file has N pages.
Pages 1..N are now available as frames X..X+N in the `wallog`, and X is the
oldest frame a replica should ever ask for -> anything before X is out-of-date
anyway. If any replica asks for an earlier page, it gets an error message
saying "please drop whatever you're doing and start asking for frames X or
greater instead.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant