From 97c1827ef9c7a61e7a9be796e4b642de5a937ec4 Mon Sep 17 00:00:00 2001 From: Andrew Farries Date: Wed, 27 Sep 2023 09:50:06 +0100 Subject: [PATCH] Add tutorial to `docs/README.md` --- docs/README.md | 335 ++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 334 insertions(+), 1 deletion(-) diff --git a/docs/README.md b/docs/README.md index eb8b4049e..fc3f6ad4f 100644 --- a/docs/README.md +++ b/docs/README.md @@ -12,7 +12,340 @@ ## Installation -## Getting started +## Tutorial + +This section will walk you through applying your first migrations using `pgroll`. + +Prerequisites: + +* `pgroll` installed and accessible somewhere on your `$PATH` +* A fresh Postgres instance against which to run migrations + +A good way to get a throw-away Postgres instance for use in the tutorial is to use [Docker](https://www.docker.com/). Start a Postgres instance in Docker with: + +``` +docker run --rm --name for-pgroll -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d postgres:16 +``` + +The remainder of the tutorial assumes that you have a local Postgres instance accessible on port 5432. + +### Initialization + +`pgroll` needs to store its own internal state somewhere in the target Postgres database. Initializing `pgroll` configures this store and makes `pgroll` ready for first use: + +``` +pgroll init +``` + +You should see a success message indicating that `pgroll` has been configured. + +
+ What data does pgroll store? + + `pgroll` stores its data in the `pgroll` schema. In this schema it creates: + * A `migrations` table containing the version history for each schema in the database + * Functions to capture the current database schema for a given schema name + * Triggers to capture DDL statements run outside of `pgroll` migrations +
+ +### First migration + +With `pgroll` initialized, let's run our first migration. Here is a migration to create a table: + +```json +{ + "name": "01_create_users_table", + "operations": [ + { + "create_table": { + "name": "users", + "columns": [ + { + "name": "id", + "type": "serial", + "pk": true + }, + { + "name": "name", + "type": "varchar(255)", + "unique": true + }, + { + "name": "description", + "type": "text", + "nullable": true + } + ] + } + } + ] +} +``` + +Take this file and save it as `sql/01_create_users_table.json`. + +The migration wil create a `users` table with three columns. It is equivalent to the following SQL DDL statement: + +```sql +CREATE TABLE users( + id SERIAL PRIMARY KEY, + name VARCHAR(255) UNIQUE NOT NULL, + description TEXT +) +``` + +To apply the migration to the database run: + +``` +pgroll start sql/01_create_users_table.json --complete +``` + +
+ What does the --complete flag do here? + + `pgroll` divides migration application into two steps: **start** and **complete**. During the **start** phase, both old and new versions of the database schema are available to client applications. After the **complete** phase, only the most recent schema is available. + + As this is the first migration there is no old schema to maintain, so the migration can safely be started and completed in one step. + + For more details about `pgroll`'s two-step migration process, see the [Multiple schema versions](#multiple-schema-versions) section. +
+ +Now let's add some users to our new table: + +```sql +INSERT INTO users (name, description) + SELECT + 'user_' || suffix, + CASE + WHEN random() < 0.5 THEN 'description for user_' || suffix + ELSE NULL + END + FROM generate_series(1, 100000) as suffix; +``` + +Execute this SQL to insert 10^5 users into the `users` table. Roughly half of the users will have descriptions and the other half will have `NULL` descriptions. + +### Second migration + +Now that we have our `users` table, lets make a non backwards-compatible change to the schema and see how `pgroll` helps us by maintaining the old and new schema versions side by side. + +Some of the users in our `users` table have descriptions and others don't. This is because our initial migration set the `description` column as `nullable: true`, allowing some users to have `NULL` values in the description field. + +We'd like to change the `users` table to disallow `NULL` values in the `description` field. We also want a `description` to be set explicitly for all new users, so we don't want to specify a default value for the column. + +There are two things that make this migration difficult: + +* We have existing `NULL` values in our `description` column that need to be updated to something non-`NULL` +* Existing applications using the table are still running and may be inserting more `NULL` descriptions + +Here is the `pgroll` migration that will perform the migration to make the `description` column `NOT NULL`: + +```json +{ + "name": "02_user_description_not_null", + "operations": [ + { + "alter_column": { + "table": "users", + "column": "description", + "not_null": true, + "up": "(SELECT CASE WHEN description IS NULL THEN 'description for ' || name ELSE description END)", + "down": "description" + } + } + ] +} +``` + +Save this migration as `sql/02_user_description_not_null.json` and start the migration: + +``` +pgroll start 02_user_description_not_null.json +``` + +After some progress updates you should see a message saying the migration has been started successfully. + +At this point it's useful to look at the table data and schema to see what `pgroll` has done. Let's look at the data first: + +```sql +SELECT * FROM users ORDER BY id LIMIT 10 +``` + +You should see something like this: +``` ++-----+----------+-------------------------+--------------------------+ +| id | name | description | _pgroll_new_description | ++-----+----------+-------------------------+--------------------------+ +| 1 | user_1 | | description for user_1 | +| 2 | user_2 | description for user_2 | description for user_2 | +| 3 | user_3 | | description for user_3 | +| 4 | user_4 | description for user_4 | description for user_4 | +| 5 | user_5 | | description for user_5 | +| 6 | user_6 | description for user_6 | description for user_6 | +| 7 | user_7 | | description for user_7 | +| 8 | user_8 | | description for user_8 | +| 9 | user_9 | description for user_9 | description for user_9 | +| 10 | user_10 | description for user_10 | description for user_10 | +``` + +`pgroll` has added a `_pgroll_new_description` field to the table and populated the field for all rows using the `up` SQL from the `02_user_description_not_null.json` file: + +``` +"up": "(SELECT CASE WHEN description IS NULL THEN 'description for ' || name ELSE description END)", +``` + +This has copied over all `description` values into the `_pgroll_new_description` field, rewriting any `NULL` values using the provided SQL. + +Now let's look at the table schema: + +``` +DESCRIBE users +``` + +You should see something like this: + +``` ++-------------------------+------------------------+-----------------------------------------------------------------+ +| Column | Type | Modifiers | ++-------------------------+------------------------+-----------------------------------------------------------------+ +| id | integer | not null default nextval('_pgroll_new_users_id_seq'::regclass) | +| name | character varying(255) | not null | +| description | text | | +| _pgroll_new_description | text | | ++-------------------------+------------------------+-----------------------------------------------------------------+ +Indexes: + "_pgroll_new_users_pkey" PRIMARY KEY, btree (id) + "_pgroll_new_users_name_key" UNIQUE CONSTRAINT, btree (name) +Check constraints: + "_pgroll_add_column_check_description" CHECK (_pgroll_new_description IS NOT NULL) NOT VALID +Triggers: + _pgroll_trigger_users__pgroll_new_description BEFORE INSERT OR UPDATE ON users FOR EACH ROW EXECUTE FUNCTION _pgroll_trigger_users__pgroll_new_description> + _pgroll_trigger_users_description BEFORE INSERT OR UPDATE ON users FOR EACH ROW EXECUTE FUNCTION _pgroll_trigger_users_description() +``` + +The `_pgroll_new_description` column has a `NOT NULL` `CHECK` constraint, but the old `description` column is still nullable. + +
+ What do the two triggers do? +
+ +Let's look at the schemas in the database: + +``` +\dn +``` + +You should see something like this: + +``` ++-------------------------------------+-------------------+ +| Name | Owner | ++-------------------------------------+-------------------+ +| pgroll | postgres | +| public | pg_database_owner | +| public_01_create_users_table | postgres | +| public_02_user_description_not_null | postgres | ++-------------------------------------+-------------------+ +``` + +We have two schema, corresponding to the old schema version, `public_01_create_users_table`, and the migration we just started, `public_02_user_description_not_null`. Each schema contains one view on the `users` table. Let's look at the view in the first schema: + +``` +\d+ public_01_create_users_table.users +``` + +The output should contain something like this: + +``` +View definition: + SELECT users.id, + users.name, + users.description + FROM users; +``` + +and for the second view: + +``` +\d+ public_02_user_description_not_null.users +``` + +The output should contain something like this: + +``` +View definition: + SELECT users.id, + users.name, + users._pgroll_new_description AS description + FROM users; +``` + +The second view exposes the same three columns as the first, but it's `description` field is using the `_pgroll_new_description` field from the underlying table. This gives applications a choice of which version of the schema they want to see; either the old version without the `NOT NULL` constraint on the `description` field or the new version that does impose a `NOT NULL` constraint on the `description` field. + +The two different version schema allow `pgroll` to present old and new versions of a database schema to client applications. + +### Completing the migration + +Once the old version of the database schema is no longer required (perhaps the old applications that depend on the old schema are no longer in production) the current migration can be completed: + +``` +pgroll complete +``` + +After the migration has completed, the old version of the schema is no longer present in the database: + +``` +\dn +``` + +shows something like: + +``` ++-------------------------------------+-------------------+ +| Name | Owner | ++-------------------------------------+-------------------+ +| pgroll | postgres | +| public | pg_database_owner | +| public_02_user_description_not_null | postgres | ++-------------------------------------+-------------------+ +``` + +Only the new version schema `public_02_user_description_not_null` remains in the database. + +Let's look at the schema of the `users` table to see what's changed there: + +``` +DESCRIBE users +``` + +shows something like: + +``` ++-------------+------------------------+-----------------------------------------------------------------+----------+--------------+-------------+ +| Column | Type | Modifiers | Storage | Stats target | Description | ++-------------+------------------------+-----------------------------------------------------------------+----------+--------------+-------------+ +| id | integer | not null default nextval('_pgroll_new_users_id_seq'::regclass) | plain | | | +| name | character varying(255) | not null | extended | | | +| description | text | not null | extended | | | ++-------------+------------------------+-----------------------------------------------------------------+----------+--------------+-------------+ +Indexes: + "_pgroll_new_users_pkey" PRIMARY KEY, btree (id) + "_pgroll_new_users_name_key" UNIQUE CONSTRAINT, btree (name) +``` + +The extra `_pgroll_new_description` has been renamed to `description` and the old `description` column has been removed. The column is now marked as `NOT NULL`. + +`pgroll` has allowed us to safely roll out this change to the `description` column. + +### Summary + +We've seen: + +* how to apply a couple of `pgroll` migrations to a database. +* how `pgroll` separates migrations into `start` and `complete` phases. +* how data is backfilled to meet constraints at the beginning of the `start` phase. +* that during the `start` phase, `pgroll` uses multiple schema to present different versions of an underlying table to client applications. +* that completing a migration removes the old schema version and cleans up the underlying table, putting it in it's final state. ## Command line reference