diff --git a/docs/.gitignore b/docs/.gitignore new file mode 100644 index 00000000..69fa449d --- /dev/null +++ b/docs/.gitignore @@ -0,0 +1 @@ +_build/ diff --git a/docs/Makefile b/docs/Makefile new file mode 100644 index 00000000..d4bb2cbb --- /dev/null +++ b/docs/Makefile @@ -0,0 +1,20 @@ +# Minimal makefile for Sphinx documentation +# + +# You can set these variables from the command line, and also +# from the environment for the first two. +SPHINXOPTS ?= +SPHINXBUILD ?= sphinx-build +SOURCEDIR = . +BUILDDIR = _build + +# Put it first so that "make" without argument is like "make help". +help: + @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) + +.PHONY: help Makefile + +# Catch-all target: route all unknown targets to Sphinx using the new +# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). +%: Makefile + @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) diff --git a/docs/about.rst b/docs/about.rst new file mode 100644 index 00000000..aef21958 --- /dev/null +++ b/docs/about.rst @@ -0,0 +1,35 @@ +About PGHoard +============= + +Features +-------- + +* Automatic periodic basebackups +* Automatic transaction log (WAL/xlog) backups (using either ``pg_receivewal`` + (formerly ``pg_receivexlog``), ``archive_command`` or experimental PG native + replication protocol support with ``walreceiver``) +* Optional Standalone Hot Backup support +* Cloud object storage support (AWS S3, Google Cloud, OpenStack Swift, Azure, Ceph) +* Backup restoration directly from object storage, compressed and encrypted +* Point-in-time-recovery (PITR) +* Initialize a new standby from object storage backups, automatically configured as + a replicating hot-standby + +Fault-resilience and monitoring +------------------------------- + +* Persists over temporary object storage connectivity issues by retrying transfers +* Verifies WAL file headers before upload (backup) and after download (restore), + so that e.g. files recycled by PostgreSQL are ignored +* Automatic history cleanup (backups and related WAL files older than N days) +* "Archive sync" tool for detecting holes in WAL backup streams and fixing them +* "Archive cleanup" tool for deleting obsolete WAL files from the archive +* Keeps statistics updated in a file on disk (for monitoring tools) +* Creates alert files on disk on problems (for monitoring tools) + + +Performance +----------- + +* Parallel compression and encryption +* WAL pre-fetching on restore diff --git a/docs/architecture.rst b/docs/architecture.rst new file mode 100644 index 00000000..553b7759 --- /dev/null +++ b/docs/architecture.rst @@ -0,0 +1,71 @@ +Architecture +============ + +PostgreSQL Point In Time Replication (PITR) consists of a having a database +basebackup and changes after that point go into WAL log files that can be +replayed to get to the desired replication point. + +PGHoard runs as a daemon which will be responsible for performing the main +tasks of a backup tool for PostgreSQL: + +* Taking periodical basebackups +* Archiving the WAL +* Managing backup retention according to a policy. + +Basebackup +---------- + +The basebackups are taken by the pghoard daemon directly, with no need for an +external scheduler / crond. + +When pghoard is first launched, it will take a basebackup. After that, the +frequency of basebackups is determined by configuration files. + +Those basebackups can be taken in one of two ways: + +* Either by copying the files directly from ``PGDATA``, using the + ``local-tar`` or ``delta`` modes +* By calling ``pg_basebackup``, using the ``basic`` or ``pipe`` modes. + +See :ref:`configuration_basebackup` for how to configure it. + +Archiving +--------- + +PGHoard supports multiple operating models. If you don't want to modify the +backuped server archiving configuration, or install anything particular on that +server, ``pghoard`` can fetch the WAL using ``pg_receivewal`` (formerly ``pg_receivexlog`` on PostgreSQL < 10). +It also provides its own replication client replacing ``pg_receivewal``, using +the ``walreceiver`` mode. This mode is currently experimental. + +PGHoard also supports a traditional ``archive_command`` in the form of the +``pghoard_postgres_command`` utility. + + +See :ref:`configuration_archiving` for how to configure it. + +Retention +--------- + +``pghoard`` expires the backups according to the configured retention policy. +Whenever there is more than the specified number of backups, older backups will +be removed as well as their associated WAL files. + +Compression and encryption +-------------------------- + +The PostgreSQL write-ahead log (WAL) and basebackups are compressed with +Snappy (default) in order to ensure good compression speed and relatively small backup size. for more information. Zstandard or LZMA encryption is also available. See :ref:`configuration_compression`. + +Encryption is not enabled by defaultn, but PGHoard can encrypt backuped data at +rest. Each individual file is encrypted and authenticated with file specific +keys. The file specific keys are included in the backup in turn encrypted with +a master RSA private/public key pair. + +You should follow the encryption section in the quickstart guide :ref:`quickstart_encryption`. For a full reference see :ref:`configuration_encryption`. + + +Deployment examples +------------------- + +FIXME: add schemas showing a deployment of pghoard on the same host with diff --git a/docs/commands.rst b/docs/commands.rst new file mode 100644 index 00000000..eb21f961 --- /dev/null +++ b/docs/commands.rst @@ -0,0 +1,132 @@ +Commands +======== + + +pghoard +------- + +``pghoard`` is the main daemon process that should be run under a service +manager, such as ``systemd`` or ``supervisord``. It handles the backup of +the configured sites. + +.. code-block:: + + usage: pghoard [-h] [-D] [--version] [-s] [--config CONFIG] [config_file] + + postgresql automatic backup daemon + + positional arguments: + config_file configuration file path (for backward compatibility) + + optional arguments: + -h, --help show this help message and exit + -D, --debug Enable debug logging + --version show program version + -s, --short-log use non-verbose logging format + --config CONFIG configuration file path + + +.. _commands_restore: + +pghoard_restore +--------------- + +``pghoard_restore`` is a command line tool that can be used to restore a +previous database backup from either ``pghoard`` itself or from one of the +supported object stores. ``pghoard_restore`` can also configure +``recovery.conf`` to use ``pghoard_postgres_command`` as the WAL +``restore_command`` in ``recovery.conf``. + + +.. code-block:: + + usage: pghoard_restore [-h] [-D] [--status-output-file STATUS_OUTPUT_FILE] [--version] + {list-basebackups-http,list-basebackups,get-basebackup} ... + +positional arguments: + list-basebackups-http + List available basebackups from a HTTP source + list-basebackups + List basebackups from an object store + get-basebackup + Download a basebackup from an object store + + +-h, --help show this help message and exit +-D, --debug Enable debug logging +--status-output-file STATUS_OUTPUT_FILE + Filename for status output JSON +--version show program version + +pghoard_archive_cleanup +----------------------- + +``pghoard_archive_cleanup`` can be used to clean up any orphan WAL files +from the object store. After the configured number of basebackups has been +exceeded (configuration key ``basebackup_count``), ``pghoard`` deletes the +oldest basebackup and all WAL associated with it. Transient object storage +failures and other interruptions can cause the WAL deletion process to leave +orphan WAL files behind, they can be deleted with this tool. + +.. code-block:: + + usage: pghoard_archive_cleanup [-h] [--version] [--site SITE] [--config CONFIG] [--dry-run] + + +-h, --help show this help message and exit +--version show program version +--site SITE pghoard site +--config CONFIG pghoard config file +--dry-run only list redundant segments and calculate total file size but do not delete + + +pghoard_archive_sync +-------------------- + +``pghoard_archive_sync`` can be used to see if any local files should +be archived but haven't been or if any of the archived files have unexpected +content and need to be archived again. The other usecase it has is to determine +if there are any gaps in the required files in the WAL archive +from the current WAL file on to to the latest basebackup's first WAL file. + +.. code-block:: + + usage: pghoard_archive_sync [-h] [-D] [--version] [--site SITE] [--config CONFIG] + [--max-hash-checks MAX_HASH_CHECKS] [--no-verify] [--create-new-backup-on-failure] + + +-h, --help show this help message and exit +-D, --debug Enable debug logging +--version show program version +--site SITE pghoard site +--config CONFIG pghoard config file +--max-hash-checks MAX_HASH_CHECKS + Maximum number of files for which to validate hash in addition to basic existence check +--no-verify do not verify archive integrity +--create-new-backup-on-failure + request a new basebackup if verification fails + +pghoard_create_keys +------------------- + +``pghoard_create_keys`` can be used to generate and output encryption keys +in the ``pghoard`` configuration format. + +``pghoard_postgres_command`` is a command line tool that can be used as +PostgreSQL's ``archive_command`` or ``recovery_command``. It communicates with +``pghoard`` 's locally running webserver to let it know there's a new file that +needs to be compressed, encrypted and stored in an object store (in archive +mode) or it's inverse (in restore mode.) + +.. code-block:: + + + usage: pghoard_create_keys [-h] [-D] [--version] [--site SITE] --key-id KEY_ID [--bits BITS] [--config CONFIG] + +-h, --help show this help message and exit +-D, --debug Enable debug logging +--version show program version +--site SITE backup site +--key-id KEY_ID key alias as used with encryption_key_id configuration directive +--bits BITS length of the generated key in bits, default 3072 +--config CONFIG configuration file to store the keys in diff --git a/docs/conf.py b/docs/conf.py new file mode 100644 index 00000000..d865b364 --- /dev/null +++ b/docs/conf.py @@ -0,0 +1,56 @@ +# Configuration file for the Sphinx documentation builder. +# +# This file only contains a selection of the most common options. For a full +# list see the documentation: +# https://www.sphinx-doc.org/en/master/usage/configuration.html + +# -- Path setup -------------------------------------------------------------- + +# If extensions (or modules to document with autodoc) are in another directory, +# add these directories to sys.path here. If the directory is relative to the +# documentation root, use os.path.abspath to make it absolute, like shown here. +# +import os +import sys +sys.path.insert(0, os.path.abspath('..')) +from version import get_project_version + + +# -- Project information ----------------------------------------------------- + +project = 'PGHoard' +copyright = '2021, Aiven' +author = 'Aiven' + +# The full version, including alpha/beta/rc tags +release = get_project_version('pghoard/version.py') + +# -- General configuration --------------------------------------------------- + +# Add any Sphinx extension module names here, as strings. They can be +# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom +# ones. +extensions = [ + "sphinx_rtd_theme" +] + +# Add any paths that contain templates here, relative to this directory. +templates_path = ['_templates'] + +# List of patterns, relative to source directory, that match files and +# directories to ignore when looking for source files. +# This pattern also affects html_static_path and html_extra_path. +exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store'] + + +# -- Options for HTML output ------------------------------------------------- + +# The theme to use for HTML and HTML Help pages. See the documentation for +# a list of builtin themes. +# +html_theme = 'sphinx_rtd_theme' + +# Add any paths that contain custom static files (such as style sheets) here, +# relative to this directory. They are copied after the builtin static files, +# so a file named "default.css" will overwrite the builtin "default.css". +html_static_path = ['_static'] diff --git a/docs/configuration.rst b/docs/configuration.rst new file mode 100644 index 00000000..53531cd4 --- /dev/null +++ b/docs/configuration.rst @@ -0,0 +1,556 @@ +.. _configuration: + +Configuration +============= + +The configuration file is in the JSON format. It consists of a pair of nested +key-values. + +For example:: + + { + "json_state_file_path": "/var/lib/pghoard/pghoard_state.json" + "backup_sites": { + "mycluster": { + "nodes": [ + { + "host": "127.0.0.1", + "password": "secret", + "port": 5432, + "user": "backup", + "slot": "pghoard" + } + ], + "basebackup_count": 5, + "basebackup_mode": "delta", + "object_storage": { + "storage_type": "local", + "directory": "/tmp/pghoard/backups" + } + } + } + } + +Global Configuration +-------------------- + +Global configuration options are specified at the top-level. +In this documentation we group them by categories. + + +Generic Configuration +~~~~~~~~~~~~~~~~~~~~~ + + + + +active (default ``true``) + Can also be set on the ``backup_site`` level to disable taking of new backups + and to stop the deletion of old ones +backup_location + Where ``pghoard`` will create its internal data structures for local + state data. +hash_algorithm (default ``"sha1"``) + The hash algorithm used for calculating checksums for WAL or other files. Must + be one of the algorithms supported by Python's `hashlib `_ +json_state_file_path (default ``"/var/lib/pghoard/pghoard_state.json"``) + Location of the JSON state file path which describes the state of the + ``pghoard`` process. +maintenance_mode_file (default ``"/var/lib/pghoard/maintenance_mode_file"``) + Trigger file for maintenance mode: if a file exists at + this location no new backup actions will be started) + FIXME: define "new backup actions" +transfer (default see below) + A JSON object defining the WAL/basebackup tranfer parameters. + + Example:: + + { + transfer: { + thread_count: 4, + upload_retries_warning_limit: 3 + } + } + + thread_count (default ``min(cpu_count + 3, 20)``) + Number of parallel uploads / downloads (FIXME: took the value from the code, + original documentation seemed wrong) + upload_retries_warning_limit (default ``3``) + Create an alert file ``upload_retries_warning`` after this many failed + upload attempts. See (FIXME: link to alert system) +tar_executable (default ``"pghoard_gnutaremu"``) + The tar command to use for restoring basebackups. This must be GNU tar because some + advanced switches like ``--transform`` are needed. If this value is not defined (or + is explicitly set to ``"pghoard_gnutaremu"``), Python's internal tarfile + implementation is used. The Python implementation is somewhat slower than the + actual tar command and in environments with fast disk IO (compared to available CPU + capacity) it is recommended to set this to ``"tar"``. +restore_prefetch (default ``transfer.thread_count``) + Number of files to prefetch when performing archive recovery. The default + is the number of Transfer Agent threads to try to utilize them all. + + +.. _configuration_logging: + +Logging configuration +~~~~~~~~~~~~~~~~~~~~~ + +log_level (default ``"INFO"``) + Determines log level of ``pghoard``. +syslog (default ``false``) + Enable / disable syslog logging +syslog_address (default ``"/dev/log"``) + Determines syslog address to use in logging (requires syslog to be true as + well) +syslog_facility (default ``"local2"``) + Determines syslog log facility. (requires syslog to be true as well) + + +.. _configuration_monitoring: + +Monitoring +~~~~~~~~~~ + +alert_file_dir (default ``backup_location`` if set else ``os.getcwd()``) + Directory in which alert files for replication warning and failover are + created. +stats (default ``null``) + When set, enables sending to a statsd daemon that supports Telegrag or DataDog + syntax with tags. + The value is a JSON object, for example:: + + { + "host": "", + "port": , + "format": "", + "tags": { + "": "" + } + } + + host + The statsd host address + port + The statsd listening port + format (default ``"telegraf"``) + Determines statsd message format. Following formats are supported: + + - ``telegraf`` `Telegraf spec `_ + - ``datadog`` `DataDog spec `_ + + :tags: (default null) + The tag key can be used to enter optional tag values for the metrics +push_gateway (default ``null``) + When set, enables sending metrics to a Prometheus Pushgateway with tags. + The value is a JSON obejct, for example:: + + { + "endpoint": "", + "tags": { + "": "" + } + } + + endpoint + The pushgateway address + tags + An object mapping tags to their values. + + +.. _configuration_http: + +HTTP Server configuration +~~~~~~~~~~~~~~~~~~~~~~~~~ + +The pghoard daemon needs to listen on an HTTP port for the archive command and +for fetching of basebackups/WAL's when restoring if not using an object store. +FIXME: if it's not required, can we disable it ? + +http_address (default ``"127.0.0.1"``) + Address to bind the PGHoard HTTP server to. Set to an empty string to + listen to all available addresses. +http_port (default ``16000``) + HTTP webserver port. Used for the archive command and for fetching of + basebackups/WAL's when restoring if not using an object store. + + + + +.. _configuration_compression: + +Compression +~~~~~~~~~~~ + +The PostgreSQL write-ahead log (WAL) and basebackups are compressed with +Snappy (default), Zstandard (configurable, level 3 by default) or LZMA (configurable, +level 0 by default) in order to ensure good compression speed and relatively small backup size. +For performance critical applications it is recommended to test compression +algorithms to find the most suitable trade-off for the particular use-case. +E.g. Snappy is fast but yields larger compressed files, Zstandard (zstd) on the other hand +offers a very wide range of compression/speed trade-off. + +The top-level ``compression`` key allows to define compression options:: + + { + "compression": { + "algorithm": "snappy", + "level": 3, + "thread_count": 4 + } + } + +algorithm (default ``snappy``) + The compression algorithm to use. Available algorithms are + ``snappy``, ``zstd``, and ``lzma`` +level (default ``0`` for ``lzma`` and ``zstd``, ``3`` for ``snappy``) + The compression level to use. Depends on the algorithm used. +thread_count (default to ``cpu_count`` + 1) + The number of threads used for parallel compression. FIXME: previous + doc wrote max(cpu_count, 5) but the code says otherwise + Contrary to ``basebackup_compression_threads`` this is the number of + compression threads started by ``pghoard``, not internal compression + threads for libraries supporting it, and is then applicable to any + compression algorithm. + + +Backup sites +------------ + +The key ``backup_sites`` contains configuration for group of PostgreSQL clusters (here +called ``sites``). Each backup site configures how to backup the different nodes +it comprises. Each site can be configured separately, under an idenfiying +site name (example: ``mysite``). + +A backup site contains an array of at least one node. For each node, the connection +information is required. The keys for a node are libpq parameters, for example:: + + { + "backup_sites": { + "mysite": { + "nodes": [ + { + "host": "127.0.0.1", + "password": "secret", + "port": 5432, + "user": "backup", + "slot": "pghoard", + "sslmode": "require" + } + ] + } + } + } + +It is advised to use a replication slot when performing a form of wal streaming archiving (``pg_receivexlog`` or ``walreceiver`` modes). + +nodes (no default) + A node can be described as an object of libpq key: value connection info pairs or libpq + connection string or a ``postgres://`` connection uri. If for example you'd + like to use a streaming replication slot use the syntax {... "slot": "slotname"}. +pg_data_directory (no default) + This is used when the ``local-tar`` or ``delta`` ``basebackup_mode`` is in + use. The data directory must point to PostgreSQL's ``$PGDATA`` and must be readable by the + ``pghoard`` daemon. +prefix: (default site_name) + Path prefix to use for all backups related to this site. +pg_bin_directory: (default find binaries from well-known directories) + Where to find the ``pg_basebackup`` and ``pg_receivewal`` (``pg_receivexlog`` + for PG < 10). + If a value is not supplied, ``pghoard`` will attempt to find matching binaries + from various well-known locations. If ``pg_data_directory`` is set and points to a + valid data directory the lookup is restricted to the version contained in + the given data directory. + + +.. _configuration_basebackup: + +Basebackup configuration +~~~~~~~~~~~~~~~~~~~~~~~~ + +The following options all concern various aspect of the basebackup process and +their retention policy. + +basebackup_mode (default ``"basic"``) + The way basebackups should be created. We support 4 different modes, the first + two use ``pg_basebackup`` while the rest directly reads the files from the + cluster. Neither ``basic`` nor ``pipe`` modes support multiple tablespaces. + + ``basic`` + runs ``pg_basebackup`` and waits for it to write an uncompressed tar file on the + disk before compressing and optionally encrypting it. + ``pipe`` + pipes the data directly from ``pg_basebackup`` to PGHoard's + compression and encryption processing reducing the amount of temporary disk + space that's required. + ``local-tar`` + Can be used only when running on the same host as the + PostgreSQL cluster. Instead of using ``pg_basebackup``, PGHoard reads the files directly from ``$PGDATA`` in this mode and compresses and optionally encrypts them. This mode allows backing up user + tablespaces. Note that the ``local-tar`` backup mode can not be used on replica servers + prior to PostgreSQL 9.6 unless the pgespresso extension is installed. + + ``delta`` + similar to ``local-tar``, but only changed files are uploaded into the storage. + On every backup snapshot of the data files is taken, this results in a manifest file, + describing the hashes of all the files needed to be backed up. + New hashes are uploaded to the storage and used together with complementary + manifest from control file for restoration. + + In order to properly assess the efficiency of ``delta`` mode in comparison with + ``local-tar``, one can use ``local-tar-delta-stats`` mode, which behaves the same as + ``local-tar``, but also collects the metrics as if it was ``delta`` mode. It can help + in decision making of switching to ``delta`` mode. +basebackup_thread (default ``1``) + How many threads to use for tar, compress and encrypt tasks. Only applies for + ``local-tar`` basebackup mode. Only values 1 and 2 are likely to be sensible for + this, with higher thread count speed improvement is negligible and CPU time is + lost switching between threads. + + + + + + +The following options define how to schedule basebackups. + +basebackup_interval_hours (default ``24``) + How often to take a new basebackup of a cluster. The shorter the interval, + the faster your recovery will be, but the more CPU/IO usage is required from + the servers it takes the basebackup from. If set to a null value basebackups + are not automatically taken at all. +basebackup_hour (default undefined) + The hour of day during which to start new basebackup. If backup interval is + less than 24 hours this is the base hour used to calculate the hours at which + backup should be taken. E.g. if backup interval is 6 hours and this value is + set to 1 backups will be taken at hours 1, 7, 13 and 19. This value is only + effective if also ``basebackup_interval_hours`` and ``basebackup_minute`` are + set. +basebackup_minute (default undefined) + The minute of hour during which to start new basebackup. This value is only + effective if also ``basebackup_interval_hours`` and ``basebackup_hour`` are + set. + + +basebackup_chunks_in_progress (default ``5``) + How many basebackup chunks can there be simultaneously on disk while + it is being taken. For chunk size configuration see ``basebackup_chunk_size``. +basebackup_chunk_size (default ``2147483648``) + In how large backup chunks to take a ``local-tar`` basebackup. Disk space + needed for a successful backup is ``basebackup_chunk_size * + basebackup_chunks_in_progress``. +basebackup_compression_threads (default ``0``) + Number of threads to use within compression library during basebackup. Only + applicable when using compression library that supports internal multithreading, + namely zstd at the moment. Default value ``0`` means not to use multithreading. + +The following options manage the retention policy. + +basebackup_age_days_max (default ``null``) + Maximum age for basebackups. Basebackups older than this will be removed. By + default this value is not defined and basebackups are deleted based on total + count instead. +basebackup_count (default ``2``) + How many basebackups should be kept around for restoration purposes. The + more there are the more diskspace will be used. If ``basebackup_max_age`` is + defined this controls the maximum number of basebackups to keep; if backup + interval is less than 24 hour or extra backups are created there can be more + than one basebackup per day and it is often desirable to set + ``basebackup_count`` to something slightly higher than the max age in days. +basebackup_count_min (default ``2``) + Minimum number of basebackups to keep. This is only effective when + ``basebackup_age_days_max`` has been defined. If for example the server is + powered off and then back on a month later, all existing backups would be very + old. However, in that case it is usually not desirable to immediately delete + all old backups. This setting allows specifying a minimum number of backups + that should always be preserved regardless of their age. + + + +.. _configuration_archiving: + +Archiving configuration +~~~~~~~~~~~~~~~~~~~~~~~ + + +active_backup_mode (default ``pg_receivexlog``) + Can be either ``pg_receivexlog`` or ``archive_command``. If set to + ``pg_receivexlog``, ``pghoard`` will start up a ``pg_receivexlog`` process to be + run against the database server. If ``archive_command`` is set, we rely on the + user setting the correct ``archive_command`` in + ``postgresql.conf``. You can also set this to the experimental ``walreceiver`` mode + whereby pghoard will start communicating directly with PostgreSQL + through the replication protocol. (Note requires psycopg2 >= 2.7) + + +pg_receivexlog + When active backup mode is set to ``"pg_receivexlog"`` this object may + optionally specify additional configuration options. The currently available + options are all related to monitoring disk space availability and optionally + pausing xlog/WAL receiving when disk space goes below configured threshold. + This is useful when PGHoard is configured to create its temporary files on + a different volume than where the main PostgreSQL data directory resides. By + default this logic is disabled and the minimum free bytes must be configured + to enable it. + + Example:: + + { + "backup_sites": { + "mysite": { + "pg_receivexlog": { + "disk_space_check_interval": 10, + "min_disk_free_bytes": null, + "resume_multiplier": 1.5 + } + } + } + + :disk_space_check_interval: (default ``10``) + How often (in seconds) to check available disk space. + :min_disk_free_bytes: (default ``null``) + Minimum bytes (in integer) that must be available in order to keep receiving + xlogs/WAL from PostgreSQL. If available disk space goes below this + limit a ``STOP`` signal is sent to the ``pg_receivexlog`` / ``pg_receivewal`` + application. + :resume_multiplier: (default ``1.5``) + Number of times the ``min_disk_free_bytes`` bytes of disk space that is + required to start receiving xlog/WAL again (i.e. send the ``CONT`` signal to + the ``pg_receivexlog`` / ``pg_receivewal`` process). Multiplier above 1 + should be used to avoid stopping and continuing the process constantly. + + + +.. _configuration_restore: + +Restore configuration +--------------------- + + + + + + +.. _configuration_storage: + +Storage configuration +~~~~~~~~~~~~~~~~~~~~~ + +FIXME: reformat that according to what's been done above + +``object_storage`` (no default) + +Configured in ``backup_sites`` under a specific site. If set, it must be an +object describing a remote object storage. The object must contain a key +``storage_type`` describing the type of the store, other keys and values are +specific to the storage type. + +``proxy_info`` (no default) + +Dictionary specifying proxy information. The dictionary must contain keys ``type``, +``host`` and ``port``. Type can be either ``socks5`` or ``http``. Optionally, +``user`` and ``pass`` can be specified for proxy authentication. Supported by +Azure, Google and S3 drivers. + +The following object storage types are supported: + +* ``local`` makes backups to a local directory, see ``pghoard-local-minimal.json`` + for example. Required keys: + + * ``directory`` for the path to the backup target (local) storage directory + +* ``sftp`` makes backups to a sftp server, required keys: + + * ``server`` + * ``port`` + * ``username`` + * ``password`` or ``private_key`` + +* ``google`` for Google Cloud Storage, required configuration keys: + + * ``project_id`` containing the Google Storage project identifier + * ``bucket_name`` bucket where you want to store the files + * ``credential_file`` for the path to the Google JSON credential file + +* ``s3`` for Amazon Web Services S3, required configuration keys: + + * ``aws_access_key_id`` for the AWS access key id + * ``aws_secret_access_key`` for the AWS secret access key + * ``region`` S3 region of the bucket + * ``bucket_name`` name of the S3 bucket + +Optional keys for Amazon Web Services S3: + + * ``encrypted`` if True, use server-side encryption. Default is False. + +* ``s3`` for other S3 compatible services such as Ceph, required + configuration keys: + + * ``aws_access_key_id`` for the AWS access key id + * ``aws_secret_access_key`` for the AWS secret access key + * ``bucket_name`` name of the S3 bucket + * ``host`` for overriding host for non AWS-S3 implementations + * ``port`` for overriding port for non AWS-S3 implementations + * ``is_secure`` for overriding the requirement for https for non AWS-S3 + * ``is_verify_tls`` for configuring tls verify for non AWS-S3 + implementations + +* ``azure`` for Microsoft Azure Storage, required configuration keys: + + * ``account_name`` for the name of the Azure Storage account + * ``account_key`` for the secret key of the Azure Storage account + * ``bucket_name`` for the name of Azure Storage container used to store + objects + * ``azure_cloud`` Azure cloud selector, ``"public"`` (default) or ``"germany"`` + +* ``swift`` for OpenStack Swift, required configuration keys: + + * ``user`` for the Swift user ('subuser' in Ceph RadosGW) + * ``key`` for the Swift secret_key + * ``auth_url`` for Swift authentication URL + * ``container_name`` name of the data container + + * Optional configuration keys for Swift: + + * ``auth_version`` - ``2.0`` (default) or ``3.0`` for keystone, use ``1.0`` with + Ceph Rados GW. + * ``segment_size`` - defaults to ``1024**3`` (1 gigabyte). Objects larger + than this will be split into multiple segments on upload. Many Swift + installations require large files (usually 5 gigabytes) to be segmented. + * ``tenant_name`` + * ``region_name`` + * ``user_id`` - for auth_version 3.0 + * ``user_domain_id`` - for auth_version 3.0 + * ``user_domain_name`` - for auth_version 3.0 + * ``tenant_id`` - for auth_version 3.0 + * ``project_id`` - for auth_version 3.0 + * ``project_name`` - for auth_version 3.0 + * ``project_domain_id`` - for auth_version 3.0 + * ``project_domain_name`` - for auth_version 3.0 + * ``service_type`` - for auth_version 3.0 + * ``endpoint_type`` - for auth_version 3.0 + + + + +.. _configuration_encryption: + +Encryption +~~~~~~~~~~ + +It is possible to set up encryption on a per-site basis. + +To generate this configuration, you can use ``pghoard_create_keys`` to generate +and output encryption keys in the ``pghoard`` configuration format. + + +encryption_key_id (no default) + Specifies the encryption key used when storing encrypted backups. If this + configuration directive is specified, you must also define the public key + for storing as well as private key for retrieving stored backups. These + keys are specified with ``encryption_keys`` dictionary. + +:encryption_keys: (no default) + This key is a mapping from key id to keys. Keys in turn are mapping from + ``public`` and ``private`` to PEM encoded RSA public and private keys + respectively. Public key needs to be specified for storing backups. Private + key needs to be in place for restoring encrypted backups. + diff --git a/docs/development.rst b/docs/development.rst new file mode 100644 index 00000000..15fbfd2f --- /dev/null +++ b/docs/development.rst @@ -0,0 +1,98 @@ +Development +=========== + +Requirements +------------ + +PGHoard can backup and restore PostgreSQL versions 9.3 and above. The +daemon is implemented in Python and works with CPython version 3.5 or newer. +The following Python modules are required: + +* psycopg2_ to look up transaction log metadata +* requests_ for the internal client-server architecture + +.. _`psycopg2`: http://initd.org/psycopg/ +.. _`requests`: http://www.python-requests.org/en/latest/ + +Optional requirements include: + +* azure_ for Microsoft Azure object storage (patched version required, see link) +* botocore_ for AWS S3 (or Ceph-S3) object storage +* google-api-client_ for Google Cloud object storage +* cryptography_ for backup encryption and decryption (version 0.8 or newer required) +* snappy_ for Snappy compression and decompression +* zstandard_ for Zstandard (zstd) compression and decompression +* systemd_ for systemd integration +* swiftclient_ for OpenStack Swift object storage +* paramiko_ for sftp object storage + +.. _`azure`: https://github.com/aiven/azure-sdk-for-python/tree/aiven/rpm_fixes +.. _`botocore`: https://github.com/boto/botocore +.. _`google-api-client`: https://github.com/google/google-api-python-client +.. _`cryptography`: https://cryptography.io/ +.. _`snappy`: https://github.com/andrix/python-snappy +.. _`zstandard`: https://github.com/indygreg/python-zstandard +.. _`systemd`: https://github.com/systemd/python-systemd +.. _`swiftclient`: https://github.com/openstack/python-swiftclient +.. _`paramiko`: https://github.com/paramiko/paramiko + +Developing and testing PGHoard also requires the following utilities: +flake8_, pylint_ and pytest_. + +.. _`flake8`: https://flake8.readthedocs.io/ +.. _`pylint`: https://www.pylint.org/ +.. _`pytest`: http://pytest.org/ + +PGHoard has been developed and tested on modern Linux x86-64 systems, but +should work on other platforms that provide the required modules. + +Vagrant +------- + +The Vagrantfile can be used to setup a vagrant development environment, consisting of two +vagrant virtual machines. + +1) Postgresql 9.4, python 3.5 and 3.6:: + + vagrant up + vagrant ssh postgres9 + cd /vagrant + source ~/venv3/bin/activate + make test + source ~/venv3.6/bin/activate + make test + +2) Postgresql 10 and python 3.7:: + + vagrant ssh postgres10 + cd /vagrant + make test + +Note: make deb does not work from vagrant at the moment, hopefully this will be resolved soon + +.. _building_from_source: + +Building +-------- + +To build an installation package for your distribution, go to the root +directory of a PGHoard Git checkout and run: + +Debian:: + + make deb + +This will produce a ``.deb`` package into the parent directory of the Git +checkout. + +Fedora:: + + make rpm + +This will produce a ``.rpm`` package usually into ``rpm/RPMS/noarch/``. + +Python/Other:: + + python setup.py bdist_egg + +This will produce an egg file into a dist directory within the same folder. diff --git a/docs/index.rst b/docs/index.rst new file mode 100644 index 00000000..49e51753 --- /dev/null +++ b/docs/index.rst @@ -0,0 +1,60 @@ +.. PGHoard documentation master file, created by + sphinx-quickstart on Tue Jul 27 13:52:50 2021. + You can adapt this file completely to your liking, but it should at least + contain the root `toctree` directive. + +PGHoard +======= + +.. |BuildStatus| image:: https://github.com/aiven/pghoard/actions/workflows/build.yml/badge.svg?branch=master +.. _BuildStatus: https://github.com/aiven/pghoard/actions + + +``pghoard`` is a PostgreSQL backup daemon and restore tooling that stores backup data in cloud object stores. + +.. toctree:: + :maxdepth: 2 + :caption: Contents: + + about + quickstart + architecture + install + commands + monitoring + configuration + development + +License +======= + +PGHoard is licensed under the Apache License, Version 2.0. Full license text +is available in the ``LICENSE`` file and at +http://www.apache.org/licenses/LICENSE-2.0.txt + + +Credits +======= + +PGHoard was created by Hannu Valtonen for +`Aiven`_ and is now maintained by Aiven developers . + +.. _`Aiven`: https://aiven.io/ + +Recent contributors are listed on the GitHub project page, +https://github.com/aiven/pghoard/graphs/contributors + + +Contact +======= + +Bug reports and patches are very welcome, please post them as GitHub issues +and pull requests at https://github.com/aiven/pghoard . Any possible +vulnerabilities or other serious issues should be reported directly to the +maintainers . + + +Copyright +========= + +Copyright (C) 2015 Aiven Ltd diff --git a/docs/install.rst b/docs/install.rst new file mode 100644 index 00000000..5edf31fe --- /dev/null +++ b/docs/install.rst @@ -0,0 +1,103 @@ +Installation +============ + +To run ``PGHoard`` you need to install it, and configure PostgreSQL according +to the modes of backup and archiving you chose. + +This section only describes how to install it using a package manager. +See :ref:`building_from_source` for other installation methods. + + +.. _installation_package: + +Installation from your distribution package manager +--------------------------------------------------- + +RHEL +++++ + +FIXME: the RPM package seems to be available on yum.postgresql.org. Write a +proper documentation for that. + +Debian +++++++ + +FIXME: can the package be included in apt.postgresql.org ? doesn't seem to be +the case for now. + + + +Installation from pip +--------------------- + +You can also install it using pip: + +``pip install pghoard`` + +FIXME: version of pghoard on pypi isn't up to date. + + +.. _installation_postgresql_configuration: + +PostgreSQL Configuration +======================== + +PosgreSQL should be configured to allow replication connections, and have a +high enough ``wal_level``. + +wal_level +--------- + +``wal_level`` should be set to at least ``replica`` (or ``archive`` for +PostgreSQL versions prior to 9.6). + +.. note:: Changing ``wal_level`` requires restarting PostgreSQL. + + +Replication connections +----------------------- + +If you use the one of the non-local basebackup strategies (``basic`` or +``pipe``), you will need to allow ``pg_basebackup`` to connect using a +replication connection. + +Additionally, if you use a WAL-streaming archiving mode (``pg_receivexlog`` or +``walreceiver``) you will need another replication connection for those. + +The parameter ``max_wal_senders`` must then be setup accordingly to allow for +at least that number of connections. You should of course take into account the +other replication connections that you may need, for one or several replicas. + +Example:: + + max_wal_senders = 4 + +.. note:: Changing ``max_wal_senders`` requires restrating PostgreSQL + +You also need a PostgreSQL user account with the ``REPLICATION`` attribute, +using psql:: + + -- create the user + CREATE USER pghoard REPLICATION; + -- Setup a password for the pghoard user + \password pghoard + +This user will need to be allowed to connect. For this you will need to edit +the ``pg_hba.conf`` file on your PostgreSQL cluster. + +For example:: + + # TYPE DATABASE USER ADDRESS METHOD + host replication pghoard 127.0.0.1/32 md5 + +.. note:: See `PostgreSQL documentation `_ for + more information + +After editing, please reload the configuration with either:: + + SELECT pg_reload_conf(); + +or by using your distribution service manager (ex: ``systemctl reload +postgresql``) + +Now you can move on to :ref:`configuration` for how to setup PGHoard.: diff --git a/docs/make.bat b/docs/make.bat new file mode 100644 index 00000000..2119f510 --- /dev/null +++ b/docs/make.bat @@ -0,0 +1,35 @@ +@ECHO OFF + +pushd %~dp0 + +REM Command file for Sphinx documentation + +if "%SPHINXBUILD%" == "" ( + set SPHINXBUILD=sphinx-build +) +set SOURCEDIR=. +set BUILDDIR=_build + +if "%1" == "" goto help + +%SPHINXBUILD% >NUL 2>NUL +if errorlevel 9009 ( + echo. + echo.The 'sphinx-build' command was not found. Make sure you have Sphinx + echo.installed, then set the SPHINXBUILD environment variable to point + echo.to the full path of the 'sphinx-build' executable. Alternatively you + echo.may add the Sphinx directory to PATH. + echo. + echo.If you don't have Sphinx installed, grab it from + echo.http://sphinx-doc.org/ + exit /b 1 +) + +%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O% +goto end + +:help +%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O% + +:end +popd diff --git a/docs/monitoring.rst b/docs/monitoring.rst new file mode 100644 index 00000000..94efe533 --- /dev/null +++ b/docs/monitoring.rst @@ -0,0 +1,47 @@ +Monitoring +========== + +Any backup tool must be properly monitored to ensure backups are correctly +performed. + +``pghoard`` provides several ways to monitor it. + + +.. note:: + In addition to monitoring, the restore process should be tested regularly + +Alert files +----------- + +Alert files are created whenever an error condition that requires human +intervention to solve. You're recommended to add checks for the existence +of these files to your alerting system. + +:authentication_error: + There has been a problem in the authentication of at least one of the + PostgreSQL connections. This usually denotes a wrong username and/or + password. +:configuration_error: + There has been a problem in the authentication of at least one of the + PostgreSQL connections. This usually denotes a missing ``pg_hba.conf`` entry or + incompatible settings in postgresql.conf. +:upload_retries_warning: + Upload of a file has failed more times than + :upload_retries_warning_limit:. Needs human intervention to figure + out why and to delete the alert once the situation has been fixed. +:version_mismatch_error: + Your local PostgreSQL client versions of ``pg_basebackup`` or + ``pg_receivewal`` (formerly ``pg_receive_xlog``) do not match with the servers PostgreSQL version. You + need to update them to be on the same version level. + +:version_unsupported_error: + Server PostgreSQL version is not supported. + +Metrics +------- + +You can configure ``pghoard`` to send metrics to an external system. Supported +systems are described in :ref:`configuration_logging`. + +FIXME: describe the different metrics and what kind of alert to trigger based on +them. diff --git a/docs/quickstart.rst b/docs/quickstart.rst new file mode 100644 index 00000000..0784145f --- /dev/null +++ b/docs/quickstart.rst @@ -0,0 +1,179 @@ +QuickStart +========== + +This quickstart will help you setup PGHoard for simple use case: + + * "Remote" basebackup using pg_basebackup + * WAL Archiving using the ``pg_receivewal`` tool + * Local archiving to ``/mnt/pghoard_backup/`` + +The local archiving is chosen because it's the easiest to demonstrate without +external dependencies. For the object storage of your choice (S3, Azure, +GCP...), refer to the appropriate section at :ref:`configuration_storage`. + +Installation +------------ + +Follow the instructions at :ref:`installation_package`. +Then, setup PostgreSQL following :ref:`installation_postgresql_configuration`. + +Configuration +------------- + +It is advised to use a replication slot to prevent WAL files from being recycled +when we haven't consumed them yet. + +You can use pg_receivewal to create your replication slot:: + + pg_receivewal --create-slot -S pghoard_slot -U pghoard + +Create a ``/var/lib/pghoard.json`` file containing the following information, replacing +the user and password you chose at the previous step:: + + { + "backup_location": "/mnt/pghoard_backup/state/", + "backup_sites": { + "my_test_cluster": { + "nodes": [ + { + "host": "127.0.0.1", + "password": "secret", + "user": "pghoard", + "slot": "pghoard_slot" + } + ], + "object_storage": { + "storage_type": "local", + "directory": "/mnt/pghoard_backup/" + }, + "pg_data_directory": "/var/lib/postgres/data/", + "pg_receivexlog_path": "/usr/bin/pg_receivewal", + "pg_basebackup_path": "/usr/bin/pg_basebackup", + "basebackup_interval_hours": 24, + "active_backup_mode": "basic" + + } + } + } + + +Testing your first backup +------------------------- + +Launching pghoard +~~~~~~~~~~~~~~~~~ + +Launch pghoard using:: + + pghoard pghoard.json + +If everything went well, you should see something like this in the logs of +pghoard:: + + 2021-07-30 15:56:48,678 PGBaseBackup Thread-23 INFO Started: ['/usr/bin/pg_basebackup', '--format', 'tar', '--label', 'pghoard_base_backup', '--verbose', '--pgdata', '/mnt/pghoard_backup/state/my_test_cluster/basebackup_incoming/2021-07-30_13-56_0', '--wal-method=none', '--progress', '--dbname', "dbname='replication' host='127.0.0.1' replication='true' user='pghoard'"], running as PID: 3652881, basebackup_location: '/mnt/pghoard_backup/state/my_test_cluster/basebackup_incoming/2021-07-30_13-56_0/base.tar' + 2021-07-30 15:56:48,805 PGBaseBackup Thread-23 INFO Ran: ['/usr/bin/pg_basebackup', '--format', 'tar', '--label', 'pghoard_base_backup', '--verbose', '--pgdata', '/mnt/pghoard_backup/state/my_test_cluster/basebackup_incoming/2021-07-30_13-56_0', '--wal-method=none', '--progress', '--dbname', "dbname='replication' host='127.0.0.1' replication='true' user='pghoard'"], took: 0.127s to run, returncode: 0 + 2021-07-30 15:56:48,922 Compressor Thread-3 INFO Compressed 33009152 byte open file '/mnt/pghoard_backup/state/my_test_cluster/basebackup_incoming/2021-07-30_13-56_0/base.tar' to 6797509 bytes (21%), took: 0.091s + 2021-07-30 15:56:48,925 TransferAgent Thread-12 INFO Uploading file to object store: src='/mnt/pghoard_backup/state/my_test_cluster/basebackup/2021-07-30_13-56_0' dst='my_test_cluster/basebackup/2021-07-30_13-56_0' + 2021-07-30 15:56:48,928 TransferAgent Thread-12 INFO Deleting file: '/mnt/pghoard_backup/state/my_test_cluster/basebackup/2021-07-30_13-56_0' since it has been uploaded + +What this means is that pghoard performed the following sequence of actions: + +- it launched pg_basebackup to perform the first basebackup of your cluster, + and stored it in a temporary location (``backup_location`` from the config file) +- then it "uploaded" it. Since we chose a local storage for backup, it is just + copied to the destination. +- finally it removes the temporary files + +This process would have been the same had you used a remote object storage like +``S3`` or ``Swift``. + +You can check the contents of the final storage location:: + + ❯ tree /mnt/pghoard_backup/my_test_cluster + /mnt/pghoard_backup/my_test_cluster + └── basebackup + ├── 2021-07-30_13-56_0 + └── 2021-07-30_13-56_0.metadata + +Restoration +~~~~~~~~~~~ + + +You can list your database basebackups by running:: + + ❯ pghoard_restore list-basebackups --config pghoard.json -v + Available 'my_test_cluster' basebackups: + + Basebackup Backup size Orig size Start time + ---------------------------------------- ----------- ----------- -------------------- + my_test_cluster/basebackup/2021-07-30_13-56_0 6 MB 31 MB 2021-07-30T13:56:48Z + metadata: {'backup-decision-time': '2021-07-30T13:56:48.673846+00:00', 'backup-reason': 'scheduled', 'start-wal-segment': '000000010000000000000081', 'pg-version': '130003', 'compression-algorithm': 'snappy', 'compression-level': '0', 'original-file-size': '33009152', 'host': 'myhost'} + +If we'd want to restore to the latest point in time we could fetch the +required basebackup by running:: + + pghoard_restore get-basebackup --config pghoard.json \ + --target-dir --restore-to-master + + Basebackup complete. + You can start PostgreSQL by running pg_ctl -D foo start + On systemd based systems you can run systemctl start postgresql + On SYSV Init based systems you can run /etc/init.d/postgresql start + +Note that the ``target-dir`` needs to be either an empty or non-existent +directory in which case PGHoard will automatically create it. + +After this we'd proceed to start both the PGHoard server process and the +PostgreSQL server normally by running (on systemd based systems, assuming +PostgreSQL 9.5 is used):: + + systemctl start pghoard + systemctl start postgresql-9.5 + +Which will make PostgreSQL start recovery process to the latest point +in time. PGHoard must be running before you start up the +PostgreSQL server. To see other possible restoration options please look at +:ref:`commands_restore`. + + +.. _quickstart_encryption: + +Optional: Adding encryption +--------------------------- + + +If you want to encrypt your backups, you need to generate a public / private RSA +key pair. + +The ``pghoard_create_keys`` script is used for that:: + + pghoard_create_keys --site my_test_site --key-id 1 + +It will output a config snippet of the form:: + + { + "backup_sites": { + "my_test_site": { + "encryption_key_id": "1", + "encryption_keys": { + "1": { + "private": "-----BEGIN PRIVATE KEY----------END PRIVATE KEY-----\n", + "public": "-----BEGIN PUBLIC KEY----------END PUBLIC KEY-----\n" + } + } + } + } + } + +If you want this server to perform both backup and restore, you will need to +copy both keys to your config file, under the ``backup_sites/my_test_site`` +section. + +If you only need to perform backups, you can store only the public key, in which +case the host running pghoard will not be able to decipher the encrypted +backups. + +.. danger:: + + Always keep a safe copy of your private key ! You WILL need it + to access your backups