Skip to content
Richard Kettlewell edited this page Jan 20, 2019 · 15 revisions

NOTE: this design is not going to be implemented in this form. I just don't have the time and effort available to re-do the relevant bits in reactive form. A thread-based design was adopted instead in https://github.com/ewxrjk/rsbackup/commit/1561eb96bb4322bc30dbb1fe07b962a4bbbea8a8.

Improving Concurrency

We currently parallelize removals, it would be nice to do the same for backups, liveness checking, etc.

The current order of things happening is found in https://github.com/ewxrjk/rsbackup/blob/master/src/MakeBackup.cc.

Backups first:

  1. live/HOST: Once per host, do a liveness check for the host.
  2. mounted/HOST/VOLUME: Once per volume, test whether the volume is mounted.
  3. flag/HOST/VOLUME: Once per volume, test whether the volume's flag file is present.
  4. Identify devices. This involves two steps...
  5. pre-access-hook: Once globally, run pre-access-hook.
  6. check-stores: Once globally, check all store paths for valid devices.
  7. pre-backup-hook/HOST/VOLUME/DEVICE: Once per (volume, device), run pre-backup-hook.
  8. backup/HOST/VOLUME/DEVICE: Once per (volume, device), make a backup.
  9. underway/HOST/VOLUME/DEVICE: Once per (volume, device), store an 'underway' backup result.
  10. post-backup-hook/HOST/VOLUME/DEVICE: Once per (volume, device), run post-backup-hook.
  11. record-backup/HOST/VOLUME/DEVICE: Once per (volume, device), store the backup result.

Pruning:

  1. find-prunable: Identify prunable backups and update database.
  2. Identify devices. See 3-5 above.
  3. remove-prunable/HOST/VOLUME/DEVICE: Remove prunable backups.
  4. log-prunable: Update database
  5. expire-prune-logs: Expire pruning logs

Finally:

  1. post-access-hook: Once globally, run post-access-hook.

The step 8/10 behaviour is a bit odd. A backup can succeed but end up recorded as 'underway' only because post-backup-hook fails, or not be recorded at all despite the possibility of an 'underway' state. This seems like a bug, leading to incomplete backups not reliably being pruned.

Work to do:

  • Expand retiry/pruning details
  • Define the ordering requirements between the individual actions.
  • Define the resources required by individual actions. NB may want to globally quiesce system for certain things e.g. pre-/post-backup-hook-* to reduce risk of LVM races.
  • Expand ActionList to support the ordering requirements.
  • ...
  • Profit!

Action relationships

  • A < B means A must complete before B can start
  • Normally A failing means B can't run; the exceptions are mentioned explicitly in the list below.

Actions not listed above:

  • identify/HOST/VOLUME: figure out whether to backup this volume
  • identify/HOST/VOLUME/DEVICE: ...to this device
  • find-prunable/HOST/VOLUME: determine which backups are prunable

Relationships between actions:

  • live/HOST < mounted/HOST/VOLUME;
  • mounted/HOST/VOLUME < flag/HOST/VOLUME
  • flag/HOST/VOLUME < identify/HOST/VOLUME
  • pre-access-hook < check-stores
  • check-stores < identify/HOST/VOLUME/DEVICE
  • identify/HOST/VOLUME < identify/HOST/VOLUME/DEVICE
  • identify/HOST/VOLUME/DEVICE < pre-backup-hook/HOST/VOLUME/DEVICE
  • pre-backup-hook/HOST/VOLUME/DEVICE < backup/HOST/VOLUME/DEVICE
  • backup/HOST/VOLUME/DEVICE < post-backup-hook/HOST/VOLUME/DEVICE (even if backup/ fails)
  • post-backup-hook/HOST/VOLUME/DEVICE < record-backup/HOST/VOLUME/DEVICE (even if post-backup-hook/ fails)
  • record-backup/HOST/VOLUME/DEVICE < find-prunable/HOST/VOLUME/DEVICE
  • backup/HOST/VOLUME/DEVICE < post-access-hook (even if backup/ fails)
  • check-stores < remove-prunable/HOST/VOLUME/DEVICE
  • find-prunable/HOST/VOLUME/DEVICE < remove-prunable/HOST/VOLUME/DEVICE/DATE
  • remove-prunable/HOST/VOLUME/DEVICE/DATE < log-prunable (even if remove-prunable/ fails)
  • log-prunable < expire-prune-logs (even if log-prunable fails)
  • remove-prunable/HOST/VOLUME/DEVICE/DATE < post-access-hook (even if remove-prunable/ fails)

If a parameterized relates to a less-parameterized name then the relation is implicitly true for all possible values of the missing parameter(s).

Maybe log-prunable can be split up a bit.

If the identify-* actions cause later actions to spring into existence then makes the successors of those actions tricky to evaluate. A way around this may be to have them "always" exist, implicitly, with the identify-* actions causing them to complete immediately; or alternatively for the output of the identify-* actions being a parameter controlling whether the successor actions do anything or complete immediately.

Possibly-Relevant Issues