Skip to content

Latest commit

 

History

History
72 lines (50 loc) · 5.69 KB

pgdump-is-not-a-backup-tool.mdx

File metadata and controls

72 lines (50 loc) · 5.69 KB
title description image author authorEmail date tags published slug ogImage
If pg_dump is not a backup tool, what is?
pg_dump was a utility for backing up a PostgreSQL database, not anymore
src alt
pg_dump was a utility for backing up a PostgreSQL database, not anymore
Gulcin Yildirim Jelinek
08-23-2024
postgres
pg_dump
backup
pgdump
true
pgdump-is-not-a-backup-tool

If pg_dump is not a backup tool, what is? 🦉

We all remember the sad day when Pluto was decommissioned as a planet. I’ve always liked Pluto—maybe it’s the name, or maybe it’s because it was the smallest, and I like supporting the underdog. Either way, it was sort of my favorite planet.

Recently, while writing about the vulnerability affecting pg_dump, the topic of decommissioning pg_dump came up on Twitter. Unlike the nostalgic feelings many had for Pluto, there was less reluctance to see pg_dump reclassified. In fact, some people were eager to retire it as a backup utility, and I even got a bit of pushback for still referring to pg_dump as one 🙂

I was talking to my colleague Simona the other day, and she mentioned that everybody in Postgres circles says, "pg_dump is not a backup tool," but perhaps it’s not always explained well why it is not.

If you visit the pg_dump documentation page, it literally says this:

pg_dump is a utility for backing up a PostgreSQL database. It makes consistent backups even if the database is being used concurrently.

But there is an upcoming change in its future, starting from Postgres 18, thanks to a commit from Peter Eisentraut.

Commit: 4f29394ea941f688fd4faf7260d2c198931ca797

doc: Avoid too prominent use of "backup" on pg_dump man page

Some users inadvertently rely on pg_dump as their primary backup tool, when better solutions exist. The pg_dump man page is arguably misleading in that it starts with:

"pg_dump is a utility for backing up a PostgreSQL database."

This tones this down a little bit, by replacing most uses of "backup" with "export" and adding a short note that pg_dump is not a general-purpose backup tool.

Discussion: https://www.postgresql.org/message-id/flat/70b48475-7706-4268-990d-fd522b038d96%40eisentraut.org

I am pleased that we are using better wording to explain the scope of pg_dump in the Postgres documentation.

What pg_dump does is take a snapshot of the database at that moment. However, it excludes certain crucial components, like WAL files and the global objects in the entire Postgres cluster.

In a moment of database disruption, you will need to restore your backups. And if you only have a pg_dump export and did not also back up your WAL directory, you would lose some of the transaction history. If you are doing replication, that would complicate things and could result in longer downtime or a more extended maintenance window if it’s a planned operation.

Backups ideally should be continuous. You shouldn’t have to worry about managing WALs and other objects back and forth between servers. You should have regular backups, continuously adding to them, and ideally restore them regularly to ensure everything is going well. It’s also important to monitor their health.

I won’t go into details about different backup tools here, but suffice it to say, pg_dump is not sufficient alone as a backup mechanism. It is a useful utility and helps a lot, but it’s not something you should consider a complete solution for backing up your Postgres clusters.

Conclusion

To recap, let’s discuss a few shortcomings of pg_dump and clarify why a dump and a backup are different things.

pg_dump creates a snapshot of the database at a specific point in time. This means it captures the state of the database up to that moment but does not account for any changes made afterward. The standard pg_dump output is a plain-text file containing SQL statements to recreate the database schema and its contents. While it can be in various formats, SQL is the most common. This output is what we refer to as a dump.

However, this dump file does not include WAL, which are essential for point-in-time recovery and replication. Additionally, global objects such as roles and tablespaces are excluded from the dump, which can lead to inconsistencies during restoration, especially in multi-database setups.

In contrast, a backup is a broader concept that involves creating copies of both data and metadata to ensure that a database can be restored in case of failure, corruption, or data loss. Backups can be:

  • Full backup: A complete copy of the entire database, including all data and metadata.
  • Incremental backup: Captures only the changes made since the last backup.
  • Continuous backup: Continuously captures changes, providing ongoing protection.

Backups generally include WAL files, allowing for point-in-time recovery. They should be performed regularly, monitored for health, and restored periodically to ensure their integrity.

While pg_dump serves well for exporting data for migrations or versioning, it is not designed to handle full disaster recovery scenarios. For a robust and reliable backup strategy, tools like Barman, WAL-G, and pgBackRest are better suited to ensure data safety and quick recovery.