Skip to content

Latest commit

 

History

History
84 lines (56 loc) · 6.68 KB

whats-new-in-pgroll-0-7-0.mdx

File metadata and controls

84 lines (56 loc) · 6.68 KB
title description image author authorEmail date tags published slug ogImage
pgroll 0.7.0 update
Learn about the latest changes in pgroll in the 0.7.0 release as we continue developing it into a leading open-source schema migration tool for Postgres.
Andrew Farries
10-23-2024
open-source
postgres
schema
migrations
pgroll
fpSchemaMigrations
oss
true
pgroll-0-7-0-update

pgroll is Xata's open-source schema migration tool for Postgres, built to enable zero downtime, reversible schema migrations using the expand/contract pattern. pgroll is open-source under the Apache-2.0 license and developed in the open on GitHub.

Since the last pgroll 0.6.0 release in May 2024, we've introduced key updates aimed at solving common database migration challenges. Support for Postgres 17 ensures compatibility with the latest version, while improvements to backfill performance—now up to 80% faster—means less time waiting for migrations to start. Configurable batch sizes and delays offer developers more control over the migration process, optimizing performance and minimizing impact on their Postgres clusters. These enhancements make migrations faster, safer, and more efficient, especially for complex or large datasets.

In this post we'll have a look at the changes that went into the most recent release, v0.7.0.

What's new in pgroll v0.7.0

The previous version of pgroll, v0.6.0, was released in May 2024. It's seen steady development since May with 31 pull requests merged in that time. Happily, this release also includes work from six first time contributors! We're building pgroll in the open and we're always happy to see people getting involved on the issue tracker and through pull requests.

The full changelog for v0.7.0 can be found here:

The full changelog since May

In this post we'll focus on some of the highlights from the changelog.

Postgres 17 support

Postgres 17 was released in September 2024. pgroll now supports Postgres 17, alongside our existing support for Postgres versions 14, 15, and 16.

Backfill performance improvements

pgroll v0.7.0 performs much better than previous versions when backfilling large tables. Backfills are central to the expand/contract pattern that pgroll uses to perform schema migrations. When a column is modified, for example to add a new constraint, pgroll creates a new column with the new constraint and backfills data from the old column to the new one on migration start.

Backfills are, unavoidably, long-running operations for tables with many records but pgroll v0.7.0 includes performance improvements that can reduce the duration of a backfill by as much as 80%.

The performance improvement was achieved by reducing the work done by the 'up' trigger that constitutes the 'inner loop' during a backfill. #389 has all the details if you're curious.

Configurable backfill batch size

Another important backfill-related change that went into v0.7.0 is the ability to configure the batch size for backfills. In previous versions of pgroll backfills were batched, processing each batch of 1000 records in a separate transaction in order to minimize the duration for which locks were held on the rows in the table.

In pgroll v0.7.0 the same batched behaviour still applies but the batch size can be tuned through the --backfill-batch-size command line flag or the PGROLL_BACKFILL_BATCH_SIZE environment variable. For tables where row locks during migration start are less of a concern, for example append-only tables with few row UPDATEs, increasing the batch size can improve performance of the backfill by requiring fewer transactions and fewer round trips to the database.

Configurable backfill delays

Again with the backfills, pgroll v0.7.0 adds the ability to configure a delay between backfill batches. Previous versions of pgroll would always attempt to backfill batches as fast as possible, with no delay between each batch. pgroll v0.7.0 adds the ability to introduce a delay between batches by using the --backfill-batch-delay command line flag or the PGROLL_BACKFILL_BATCH_DELAY environment variable.

Controlling the rate of the backfill can be useful in scenarios where the database is under heavy load and you want to reduce the impact of the backfill so as not to disrupt other database clients. Downstream replicas can also become overloaded by a backfill and slowing the rate at which backfills are performed prevents replicas falling too far behind the primary.

Partial index support

As of v0.7.0, pgroll supports partial indexes as part of the create_index operation by specifying a predicate field:

{
  "name": "37_create_partial_index",
  "operations": [
    {
      "create_index": {
        "name": "idx_fruits_id_gt_10",
        "table": "fruits",
        "columns": ["id"],
        "predicate": "id > 10"
      }
    }
  ]
}

Partial indexes allow specifying a predicate as part of the operation to restrict the set of records included in the index. This is helpful to reduce the size of an index, given that the query planner will likely not use the index to find common values anyway.

Conclusion

pgroll is in active development as we work towards a v1.0 release of the project. The features added in v0.7.0 represent another step in this direction, improving performance, fixing bugs and adding new features.

October is Hacktoberfest and we've tagged some issues with the Hacktoberfest label. We've had a couple of contributions from new contributors via Hacktoberfest already this month and we're always open to more.

We'd love to hear from you if you're using pgroll in your projects. We welcome external contributions to the project, either in the form of issues on the issue tracker or by getting involved with development directly by opening pull requests.