Concurrency

NOTE: this design is not going to be implemented in this form. I just don't have the time and effort available to re-do the relevant bits in reactive form. A thread-based design was adopted instead in https://github.com/ewxrjk/rsbackup/commit/1561eb96bb4322bc30dbb1fe07b962a4bbbea8a8.

Improving Concurrency

We currently parallelize removals, it would be nice to do the same for backups, liveness checking, etc.

The current order of things happening is found in https://github.com/ewxrjk/rsbackup/blob/master/src/MakeBackup.cc.

Backups first:

live/HOST: Once per host, do a liveness check for the host.
mounted/HOST/VOLUME: Once per volume, test whether the volume is mounted.
flag/HOST/VOLUME: Once per volume, test whether the volume's flag file is present.
Identify devices. This involves two steps...
pre-access-hook: Once globally, run pre-access-hook.
check-stores: Once globally, check all store paths for valid devices.
pre-backup-hook/HOST/VOLUME/DEVICE: Once per (volume, device), run pre-backup-hook.
backup/HOST/VOLUME/DEVICE: Once per (volume, device), make a backup.
underway/HOST/VOLUME/DEVICE: Once per (volume, device), store an 'underway' backup result.
post-backup-hook/HOST/VOLUME/DEVICE: Once per (volume, device), run post-backup-hook.
record-backup/HOST/VOLUME/DEVICE: Once per (volume, device), store the backup result.

Pruning:

find-prunable: Identify prunable backups and update database.
Identify devices. See 3-5 above.
remove-prunable/HOST/VOLUME/DEVICE: Remove prunable backups.
log-prunable: Update database
expire-prune-logs: Expire pruning logs

Finally:

post-access-hook: Once globally, run post-access-hook.

The step 8/10 behaviour is a bit odd. A backup can succeed but end up recorded as 'underway' only because post-backup-hook fails, or not be recorded at all despite the possibility of an 'underway' state. This seems like a bug, leading to incomplete backups not reliably being pruned.

Work to do:

~~Expand retiry/pruning details~~
~~Define the ordering requirements between the individual actions.~~
Define the resources required by individual actions. NB may want to globally quiesce system for certain things e.g. pre-/post-backup-hook-* to reduce risk of LVM races.
~~Expand ActionList to support the ordering requirements.~~
...
Profit!

Action relationships

A < B means A must complete before B can start
Normally A failing means B can't run; the exceptions are mentioned explicitly in the list below.

Actions not listed above:

identify/HOST/VOLUME: figure out whether to backup this volume
identify/HOST/VOLUME/DEVICE: ...to this device
find-prunable/HOST/VOLUME: determine which backups are prunable

Relationships between actions:

live/HOST < mounted/HOST/VOLUME;
mounted/HOST/VOLUME < flag/HOST/VOLUME
flag/HOST/VOLUME < identify/HOST/VOLUME
pre-access-hook < check-stores
check-stores < identify/HOST/VOLUME/DEVICE
identify/HOST/VOLUME < identify/HOST/VOLUME/DEVICE
identify/HOST/VOLUME/DEVICE < pre-backup-hook/HOST/VOLUME/DEVICE
pre-backup-hook/HOST/VOLUME/DEVICE < backup/HOST/VOLUME/DEVICE
backup/HOST/VOLUME/DEVICE < post-backup-hook/HOST/VOLUME/DEVICE (even if backup/ fails)
post-backup-hook/HOST/VOLUME/DEVICE < record-backup/HOST/VOLUME/DEVICE (even if post-backup-hook/ fails)
record-backup/HOST/VOLUME/DEVICE < find-prunable/HOST/VOLUME/DEVICE
backup/HOST/VOLUME/DEVICE < post-access-hook (even if backup/ fails)
check-stores < remove-prunable/HOST/VOLUME/DEVICE
find-prunable/HOST/VOLUME/DEVICE < remove-prunable/HOST/VOLUME/DEVICE/DATE
remove-prunable/HOST/VOLUME/DEVICE/DATE < log-prunable (even if remove-prunable/ fails)
log-prunable < expire-prune-logs (even if log-prunable fails)
remove-prunable/HOST/VOLUME/DEVICE/DATE < post-access-hook (even if remove-prunable/ fails)

If a parameterized relates to a less-parameterized name then the relation is implicitly true for all possible values of the missing parameter(s).

Maybe log-prunable can be split up a bit.

If the identify-* actions cause later actions to spring into existence then makes the successors of those actions tricky to evaluate. A way around this may be to have them "always" exist, implicitly, with the identify-* actions causing them to complete immediately; or alternatively for the output of the identify-* actions being a parameter controlling whether the successor actions do anything or complete immediately.

Possibly-Relevant Issues

https://github.com/ewxrjk/rsbackup/issues/17 - actually just about the way hooks are run
https://github.com/ewxrjk/rsbackup/issues/26 - availability checking
https://github.com/ewxrjk/rsbackup/issues/16 - also about hooks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concurrency

Improving Concurrency

Action relationships

Possibly-Relevant Issues

Clone this wiki locally