title | description | image | author | authorEmail | date | tags | published | slug | ogImage | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
If pg_dump is not a backup tool, what is? |
pg_dump was a utility for backing up a PostgreSQL database, not anymore |
|
Gulcin Yildirim Jelinek |
08-23-2024 |
|
true |
pgdump-is-not-a-backup-tool |
We all remember the sad day when Pluto was decommissioned as a planet. I’ve always liked Pluto—maybe it’s the name, or maybe it’s because it was the smallest, and I like supporting the underdog. Either way, it was sort of my favorite planet.
Recently, while writing about the vulnerability affecting pg_dump
, the topic of decommissioning pg_dump
came up on Twitter. Unlike the nostalgic feelings many had for Pluto, there was less reluctance to see pg_dump
reclassified. In fact, some people were eager to retire it as a backup utility, and I even got a bit of pushback for still referring to pg_dump
as one 🙂
I was talking to my colleague Simona the other day, and she mentioned that everybody in Postgres circles says, "pg_dump is not a backup tool," but perhaps it’s not always explained well why it is not.
If you visit the pg_dump
documentation page, it literally says this:
pg_dump
is a utility for backing up a PostgreSQL database. It makes consistent backups even if the database is being used concurrently.
But there is an upcoming change in its future, starting from Postgres 18, thanks to a commit from Peter Eisentraut.
Commit: 4f29394ea941f688fd4faf7260d2c198931ca797
doc: Avoid too prominent use of "backup" on pg_dump man page
Some users inadvertently rely on
pg_dump
as their primary backup tool, when better solutions exist. Thepg_dump
man page is arguably misleading in that it starts with:"pg_dump is a utility for backing up a PostgreSQL database."
This tones this down a little bit, by replacing most uses of "backup" with "export" and adding a short note that
pg_dump
is not a general-purpose backup tool.Discussion: https://www.postgresql.org/message-id/flat/70b48475-7706-4268-990d-fd522b038d96%40eisentraut.org
I am pleased that we are using better wording to explain the scope of pg_dump
in the Postgres documentation.
What pg_dump
does is take a snapshot of the database at that moment. However, it excludes certain crucial components, like WAL files and the global objects in the entire Postgres cluster.
In a moment of database disruption, you will need to restore your backups. And if you only have a pg_dump
export and did not also back up your WAL directory, you would lose some of the transaction history. If you are doing replication, that would complicate things and could result in longer downtime or a more extended maintenance window if it’s a planned operation.
Backups ideally should be continuous. You shouldn’t have to worry about managing WALs and other objects back and forth between servers. You should have regular backups, continuously adding to them, and ideally restore them regularly to ensure everything is going well. It’s also important to monitor their health.
I won’t go into details about different backup tools here, but suffice it to say, pg_dump
is not sufficient alone as a backup mechanism. It is a useful utility and helps a lot, but it’s not something you should consider a complete solution for backing up your Postgres clusters.
To recap, let’s discuss a few shortcomings of pg_dump
and clarify why a dump and a backup are different things.
pg_dump
creates a snapshot of the database at a specific point in time. This means it captures the state of the database up to that moment but does not account for any changes made afterward. The standard pg_dump
output is a plain-text file containing SQL statements to recreate the database schema and its contents. While it can be in various formats, SQL is the most common. This output is what we refer to as a dump.
However, this dump file does not include WAL, which are essential for point-in-time recovery and replication. Additionally, global objects such as roles and tablespaces are excluded from the dump, which can lead to inconsistencies during restoration, especially in multi-database setups.
In contrast, a backup is a broader concept that involves creating copies of both data and metadata to ensure that a database can be restored in case of failure, corruption, or data loss. Backups can be:
- Full backup: A complete copy of the entire database, including all data and metadata.
- Incremental backup: Captures only the changes made since the last backup.
- Continuous backup: Continuously captures changes, providing ongoing protection.
Backups generally include WAL files, allowing for point-in-time recovery. They should be performed regularly, monitored for health, and restored periodically to ensure their integrity.
While pg_dump
serves well for exporting data for migrations or versioning, it is not designed to handle full disaster recovery scenarios. For a robust and reliable backup strategy, tools like Barman, WAL-G, and pgBackRest are better suited to ensure data safety and quick recovery.