Skip to content

Commit

Permalink
docs/dev: add destructive migration docs (#16831)
Browse files Browse the repository at this point in the history
  • Loading branch information
woodruffw authored Oct 3, 2024
1 parent 50a58f3 commit da7b53d
Showing 1 changed file with 54 additions and 5 deletions.
59 changes: 54 additions & 5 deletions docs/dev/development/database-migrations.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,8 @@ to the old version of Warehouse from being shut down. This means that each
migration *must* be compatible with the current ``main`` branch of Warehouse.

This makes it more difficult to make breaking changes, since you must phase
them in over time (for example, to rename a column you must add the column in
one migration + start writing to that column/reading from both, then you must
make a migration that backfills all of the data, then switch the code to stop
using the old column all together, then finally you can remove the old column).
them in over time. See :ref:`destructive-migrations` for tips on doing
migrations that involve column deletions or renames.

To help protect against an accidentally long running migration from taking down
PyPI, by default a migration will timeout if it is waiting more than 4s to
Expand All @@ -48,10 +46,61 @@ add:
To your migration.


For more information on what kind of operations are safe in a high availability
environment like PyPI, there is related reading available at:

- `PostgreSQL at Scale: Database Schema Changes Without Downtime <https://medium.com/paypal-tech/postgresql-at-scale-database-schema-changes-without-downtime-20d3749ed680>`_
- `Move fast and migrate things: how we automated migrations in Postgres <https://benchling.engineering/move-fast-and-migrate-things-how-we-automated-migrations-in-postgres-d60aba0fc3d4>`_
- `PgHaMigrations <https://github.com/braintree/pg_ha_migrations>`_

.. _destructive-migrations:

Destructive migrations
----------------------

.. warning::

Read this section and its respective sub-sections **completely** before
attempting to follow them! Failure to do so can result in serious
deployment errors and outages.

Migrations that do column renames or deletions need to be performed
with special care, due to how Warehouse is deployed. Performing a
migration without these steps will cause errors during deployment,
and may require a full revert.

.. _removing-a-column:

Removing a column
=================

To remove a column:

1. Perform the Python-level code changes, i.e. remove usages of the
column/attribute within Warehouse itself. Do **not** generate
an accompanying migration.
2. Submit the changes as a PR. Tag the PR with ``skip-db-check`` to allow
it to pass CI without accompanying migrations.
3. Prepare a second PR containing just the generated migrations.
4. Merge the first PR and ensure its deployment before merging the second.

This will ensure that the "old" version of Warehouse (prior to the new migration
has no references to the column being deleted).

Renaming a column
=================

Renaming a column is more complex than deleting a column, since it involves
a data migration. To rename a column:

1. Create an initial migration that adds the new column, and add code that
writes to the new column while reading from both it and the old column.
2. Deploy the initial migration.
3. Prepare a second migration that performs a backfill of the old column to
the new column.
4. Deploy the second migration.
5. Follow the :ref:`removing-a-column` steps *in entirety* to remove the old
column.

In total, this requires three separate migrations: one to add the new column,
one to backfill to it, and a third to remove the old column.

0 comments on commit da7b53d

Please sign in to comment.