Let bootstrap work on tables that aren't partitioned yet. #13

jcjones · 2021-07-27T21:01:54Z

Add a --assume-partitioned-on flag to bootstrap, to facilitate operation on
tables that do not yet have a partition map. One needs to supply
--assume-partitioned-on COLUMN_NAME as many times as needed to identify all
the columns which will be part of the partition expression.

The 'bootstrap' command now emits table-copy instructions

The original plan for 'bootstrap' was to do live alterations, that they should
only lock what they needed, however InnoDB likes to lock everything. So instead
we need to always assume that "bootstrapping" will be a live table clone,
and at their conclusion the team should perform an atomic rename.

- This moves from the -E to -X XML-based CLI and parses it

…ion-manager

This removes any dependnecy on the auto-increment value, as AI can only tell us a single column's position, and for multi-column partitions we'll need more than that. This change does not totally remove the get_autoincrement method, as we'll still want to confirm the table has the partitioned feature, which we'll do in a next commit.

Removes the auto_incremennt mechanisms.

This adds the necessary retention options for later command processing, which is not included in this commit.

…vements too.

There are other BY RANGE options too, but let's just start with COLUMNS

…ike it

* First pass algorithm * Add ability to compare partition positions * Add a split method for dividing partition lists * More tests * Add a position rate function * Add methods to determine a weighted rate of increase * Add docs to the new table_append_partition methods * Use the Partition timestamp() method * plan_partition_changes algorithm * More partition planning tests * Predictive partitiong algorithm functioning in tests * Rework the CLI to use the new partition planning algorithm * Passing integration tests * Handle short and bespoke partition names. * Improve logging * Remove spurious strip * Moving to 0.2.0 * Logging cleanups * Fix a host of pylint issues $ pylint --ignore-patterns=.*_test.py partitionmanager/ --disable W1203 --disable invalid-name --disable bad-continuation ************* Module partitionmanager.tools partitionmanager/tools.py:22:11: R1708: Do not raise StopIteration in generator, use return statement instead (stop-iteration-return) ************* Module partitionmanager.sql partitionmanager/sql.py:36:0: R0903: Too few public methods (1/2) (too-few-public-methods) ************* Module partitionmanager.stats partitionmanager/stats.py:12:0: R0903: Too few public methods (0/2) (too-few-public-methods) partitionmanager/stats.py:65:0: R0912: Too many branches (14/12) (too-many-branches) ************* Module partitionmanager.table_append_partition partitionmanager/table_append_partition.py:98:0: R0914: Too many local variables (16/15) (too-many-locals) partitionmanager/table_append_partition.py:306:0: R0914: Too many local variables (23/15) (too-many-locals) ------------------------------------------------------------------ Your code has been rated at 9.92/10 (previous run: 9.91/10, +0.01) * Better logging on partition * Never adjust the active_partition MariaDB has a limitation on editing the active partition, particularly: `ERROR 1520 (HY000): Reorganize of range partitions cannot change total ranges except for last partition where it can extend the range` so we can't edit the active partition, either. * Never edit positions on empty partitions Like the previous commit, MariaDB has a limitation on editing any partition's offset: `ERROR 1520 (HY000): Reorganize of range partitions cannot change total ranges except for last partition where it can extend the range` So the positions field should never be edited for existing partitions, only their names. * Consolidate logic to use partition names as start-of-fill dates * stderr is not so useful from the Subprocess Database Command, let's dump it * Bugfix: get_current_positions needs to query the latest of each column Before, get_current_positions returned each column for the entry with the largest ID from the first column, while for partitioning purposes we actually want to always be strictly increasing. This does make such tables less space-efficient, but that's a matter for partition design. * Add "bootstrap" methods to prepare partitioned tables Tables whose partitions don't contain datestamps of the p_YYYYMMDD form don't provide partman enough info to derive rates of change, so these bootstrap routines will save a YAML file somewhere with point-in-time data that can be reloaded to derive a rate-of-change. This is only intended to be used for the initial partitioning of a table, or when a table has no empty partitions. In a subsequent commit I'll tie this into cli.py, ensuring to add alerts that these ALTERs cannot be expected to complete quickly, that likely the database will hold locks for substantial amounts of time for each of the ALTER commands, and the tool will simply be printing potential ALTER commands to console for an operator to analyze and run in the manner they find best. * Wire up Bootstrap to the CLI * Rework CLI to print yaml-like but stringified output

…xclusive

* Revert "Describe that this function checks for the existance a table too (#6)" This reverts commit bdf4703. * Update table_append_partition.py

tgeoghegan

So I see where inserts and updates to the old table will be replicated to the new table with triggers. However I don't understand how the rows from the old table get copied to the new table. Or is that intentionally not done, the idea being that only the new partitions are created on the new table?

partitionmanager/cli.py

partitionmanager/bootstrap.py

jcjones · 2021-08-19T17:38:30Z

So I see where inserts and updates to the old table will be replicated to the new table with triggers. However I don't understand how the rows from the old table get copied to the new table. Or is that intentionally not done, the idea being that only the new partitions are created on the new table?

Right. For our purposes we don't backfill the data, instead we rely on ongoing copies until we atomically shift traffic over. Doing data backfills takes too much CPU time.

Commands: pushd /tmp; git clone https://github.com/letsencrypt/mariadb-sequential-partition-manager-py.git; popd pushd /tmp/mariadb-sequential-partition-manager-py; git checkout -b pr-branch origin/pr-branch; popd git checkout main; git fetch origin; git reset --hard main cp -a /tmp/mariadb-sequential-partition-manager-py/* . git commit -a

Add a --assume-partitioned-on flag to bootstrap, to facilitate operation on tables that do not yet have a partition map. One needs to supply `--assume-partitioned-on COLUMN_NAME` as many times as needed to identify all the columns which will be part of the partition expression. The 'bootstrap' command now emits table-copy instructions The original plan for 'bootstrap' was to do live alterations, that they should only lock what they needed, however InnoDB likes to lock everything. So instead we need to always assume that "bootstrapping" will be a live table clone, and at their conclusion the team should perform an atomic rename. Fix unit test to not depend on time of day (oops) Catch an arithmetic error Emit output lines with MySQL comment characters

This corrects a bug in MaxValuePartition where its value string wasn't paren-surrounded if there are actually multiple columns.

jcjones added 30 commits January 22, 2021 12:35

Initial functionality

51e3483

Fix cli test to expect the current date

6ef61fa

Add more CircleCI tests - flake8, pylint

06fb3c6

Initial PEP249 MySQL connector support

6d1fe73

Remove db option

d12cac3

Use structures instead of line parsing for all DB queries

e711c94

- This moves from the -E to -X XML-based CLI and parses it

Functioning via DB connect for single-value partitioned tables

e66dd3f

Test for duplicates, add a few tests of reorganize_partition

aeb2f7c

Catch truncated xml results from a subprocess

58026a8

Dwarf tables

b3145b8

Rename to sequential partition manager, and change CLI tool to partit…

ca8fc3e

…ion-manager

Typo fix in README

8287a0a

Confirm partitioned status before operating on any supplied table

ea9adbf

Removes the auto_incremennt mechanisms.

XML parser improved tests and assertions

3576716

Move partition definitions from tuples to explicit classes

bd07788

Show query debugging

c5fb095

Add basic YAML configuration processing.

171b952

This adds the necessary retention options for later command processing, which is not included in this commit.

Rename add_partition to just add

4d0f951

Add a lifespan configuration value and only partition when needed

86db05f

Have full time resolution for partition decisions. Some logging impro…

bbd6810

…vements too.

Add basic statistcs command.

15e7f77

Print table compatibility issues for all tables before exiting.

74d5704

Bugfix: Tables with multiple columns use BY RANGE COLUMNS

3af7ee4

There are other BY RANGE options too, but let's just start with COLUMNS

Export Prometheus-style statistics for stats command, if configured

2aaa6f7

Improve tests for the prometheus stats

41014d5

Add a time_since_oldest gauge, and rename the gauges a bit

de442b1

Emit stats on an 'add' command too

744f8d6

Fix prometheus quoting of the labels

e4925fe

Remove the optional timestamp, as our version of Prometheus doesn't l…

4591242

…ike it

jcjones and others added 12 commits March 9, 2021 21:40

Per-table partition durations

01de102

v0.1.0: Minimal features complete

e0a0271

v0.1.1, Bugfix: yaml dburls weren't preprocessed

cfbc920

Emit a statistic for table alteration time

011f836

Pre-commit: Make Black run before other tools

a45cce8

Spelling fix: partition_name_now

bba56d3

Rename partition_duration to partition_period

52a41a3

Update README, bugfix that --table and --in/--out were all mutually e…

bcdc3e9

…xclusive

More documentation for the Types

69f7542

Describe that this function checks for the existance a table too (#6)

bdf4703

Correct spelling in table_information_schema_is_compatible (#7)

9b707e1

* Revert "Describe that this function checks for the existance a table too (#6)" This reverts commit bdf4703. * Update table_append_partition.py

jcjones mentioned this pull request Jul 27, 2021

Let bootstrap work on tables that aren't partitioned yet. #10

Closed

tgeoghegan reviewed Aug 5, 2021

View reviewed changes

partitionmanager/cli.py Outdated Show resolved Hide resolved

partitionmanager/bootstrap.py Outdated Show resolved Hide resolved

partitionmanager/bootstrap.py Outdated Show resolved Hide resolved

jcjones force-pushed the bootstrap_tablecopy branch from 411e288 to ee018d7 Compare August 19, 2021 18:16

jcjones requested a review from tgeoghegan August 19, 2021 18:17

jcjones added 3 commits September 20, 2021 10:20

Fix review comments from Tim

afc8400

This corrects a bug in MaxValuePartition where its value string wasn't paren-surrounded if there are actually multiple columns.

jcjones closed this Sep 20, 2021

jcjones force-pushed the bootstrap_tablecopy branch from ee018d7 to afc8400 Compare September 20, 2021 19:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Let bootstrap work on tables that aren't partitioned yet. #13

Let bootstrap work on tables that aren't partitioned yet. #13

jcjones commented Jul 27, 2021

tgeoghegan left a comment

jcjones commented Aug 19, 2021

Let bootstrap work on tables that aren't partitioned yet. #13

Let bootstrap work on tables that aren't partitioned yet. #13

Conversation

jcjones commented Jul 27, 2021

tgeoghegan left a comment

Choose a reason for hiding this comment

jcjones commented Aug 19, 2021