Refactor data migration docs (pingcap#7202)

easonn7 · Dec 31, 2021 · 60106df · 60106df
1 parent 8b028e6
commit 60106df
Show file tree

Hide file tree

Showing 23 changed files with 2,269 additions and 263 deletions.
diff --git a/TOC.md b/TOC.md
@@ -40,15 +40,21 @@
     - [Test TiDB Using TPC-C](/benchmark/benchmark-tidb-using-tpcc.md)
 - Migrate
   - [Overview](/migration-overview.md)
-  - Migrate from MySQL
-    - [Migrate from Amazon Aurora MySQL Using TiDB Lightning](/migrate-from-aurora-using-lightning.md)
-    - [Migrate from MySQL SQL Files Using TiDB Lightning](/migrate-from-mysql-dumpling-files.md)
-    - [Migrate from Amazon Aurora MySQL Using DM](/migrate-from-aurora-mysql-database.md)
-  - Migrate from CSV Files
-    - [Use TiDB Lightning](/tidb-lightning/migrate-from-csv-using-tidb-lightning.md)
-    - [Use `LOAD DATA` Statement](/sql-statements/sql-statement-load-data.md)
-  - [Migrate from SQL Files](/migrate-from-mysql-dumpling-files.md)
-  - [Replicate Incremental Data between TiDB Clusters in Real Time](/incremental-replication-between-clusters.md)
+  - [Migration Tools](/migration-tools.md)
+  - Migration Scenarios
+    - [Migrate from Aurora](/migrate-aurora-to-tidb.md)
+    - [Migrate MySQL of Small Datasets](/migrate-small-mysql-to-tidb.md)
+    - [Migrate MySQL of Large Datasets](/migrate-large-mysql-to-tidb.md)
+    - [Migrate and Merge MySQL Shards of Small Datasets](/migrate-small-mysql-shards-to-tidb.md)
+    - [Migrate and Merge MySQL Shards of Large Datasets](/migrate-large-mysql-shards-to-tidb.md)
+    - [Migrate from CSV Files](/migrate-from-csv-files-to-tidb.md)
+    - [Migrate from SQL Files](/migrate-from-sql-files-to-tidb.md)
+    - [Replicate Incremental Data between TiDB Clusters](/incremental-replication-between-clusters.md)
+  - Advanced Migration
+    - [Continuous Replication with gh-ost or pt-osc](/migrate-with-pt-ghost.md)
+    - [Filter Binlog Events](/filter-binlog-event.md)
+    - [Filter DML Events Using SQL Expressions](/filter-dml-event.md)
+    - [Migrate to a Downstream Table with More Columns](/migrate-with-more-columns-downstream.md)
 - Maintain
   - Upgrade
     - [Use TiUP (Recommended)](/upgrade-tidb-using-tiup.md)

diff --git a/_index.md b/_index.md
@@ -49,10 +49,9 @@ Designed for the cloud, TiDB provides flexible scalability, reliability and secu
 <ColumnTitle>Migrate Data</ColumnTitle>
 
 - [Migration Overview](/migration-overview.md)
-- [Migrate full data from Aurora](/migrate-from-aurora-using-lightning.md)
-- [Migrate continuously from Aurora/MySQL Database](/migrate-from-aurora-mysql-database.md)
-- [Migrate from CSV Files](/tidb-lightning/migrate-from-csv-using-tidb-lightning.md)
-- [Migrate from MySQL SQL Files](/migrate-from-mysql-dumpling-files.md)
+- [Migrate Data from CSV Files to TiDB](/migrate-from-csv-files-to-tidb.md)
+- [Migrate Data from SQL Files to TiDB](/migrate-from-sql-files-to-tidb.md)
+- [Migrate Data from Amazon Aurora to TiDB](/migrate-aurora-to-tidb.md)
 
 </NavColumn>
 

diff --git a/filter-binlog-event.md b/filter-binlog-event.md
@@ -0,0 +1,124 @@
+---
+title: Filter Binlog Events
+summary: Learn how to filter binlog events when migrating data.
+---
+
+# Filter Binlog Events
+
+This document describes how to filter binlog events when you use DM to perform continuous incremental data replication. For the detailed replication instructions, refer to the following documents by scenarios:
+
+- [Migrate MySQL of Small Datasets to TiDB](/migrate-small-mysql-to-tidb.md)
+- [Migrate MySQL of Large Datasets to TiDB](/migrate-large-mysql-to-tidb.md)
+- [Migrate and Merge MySQL Shards of Small Datasets to TiDB](/migrate-small-mysql-shards-to-tidb.md)
+- [Migrate and Merge MySQL Shards of Large Datasets to TiDB](/migrate-large-mysql-shards-to-tidb.md)
+
+## Configuration
+
+To use binlog event filter, add a `filter` to the task configuration file of DM, as shown below:
+
+```yaml
+filters:
+  rule-1:
+    schema-pattern: "test_*"
+    table-pattern: "t_*"
+    events: ["truncate table", "drop table"]
+    sql-pattern: ["^DROP\\s+PROCEDURE", "^CREATE\\s+PROCEDURE"]
+    action: Ignore
+```
+
+- `schema-pattern`/`table-pattern`: Filters matching schemas or tables
+- `events`: Filters binlog events. Supported events are listed in the table below:
+
+  | Event           | Category | Description                       |
+  | --------------- | ---- | --------------------------|
+  | all             |      | Includes all events            |
+  | all dml         |      | Includes all DML events        |
+  | all ddl         |      | Includes all DDL events        |
+  | none            |      | Includes no event          |
+  | none ddl        |      | Excludes all DDL events      |
+  | none dml        |      | Excludes all DML events      |
+  | insert          | DML  | Insert DML event      |
+  | update          | DML  | Update DML event      |
+  | delete          | DML  | Delete DML event      |
+  | create database | DDL  | Create database event |
+  | drop database   | DDL  | Drop database event   |
+  | create table    | DDL  | Create table event    |
+  | create index    | DDL  | Create index event    |
+  | drop table      | DDL  | Drop table event      |
+  | truncate table  | DDL  | Truncate table event  |
+  | rename table    | DDL  | Rename table event    |
+  | drop index      | DDL  | Drop index event      |
+  | alter table     | DDL  | Alter table event     |
+
+- `sql-pattern`：Filters specified DDL SQL statements. The matching rule supports using a regular expression.
+- `action`: `Do` or `Ignore`
+
+    - `Do`: the allow list. A binlog event is replicated if meeting either of the following two conditions:
+
+        - The event matches the rule setting.
+        - sql-pattern has been specified and the SQL statement of the event matches any of the sql-pattern options.
+
+    - `Ignore`: the block list. A binlog event is filtered out if meeting either of the following two conditions:
+
+        - The event matches the rule setting.
+        - sql-pattern has been specified and the SQL statement of the event matches any of the sql-pattern options.
+
+    If both `Do` and `Ignore` are configured, `Ignore` has higher priority over `Do`. That is, an event satisfying both `Ignore` and `Do` conditions will be filtered out.
+
+## Application scenarios
+
+This section describes the application scenarios of binlog event filter.
+
+### Filter out all sharding deletion operations
+
+To filter out all deletion operations, configure a `filter-table-rule` and a `filter-schema-rule`, as shown below:
+
+```
+filters:
+  filter-table-rule:
+    schema-pattern: "test_*"
+    table-pattern: "t_*"
+    events: ["truncate table", "drop table", "delete"]
+    action: Ignore
+  filter-schema-rule:
+    schema-pattern: "test_*"
+    events: ["drop database"]
+    action: Ignore
+```
+
+### Migrate only DML operations of sharded schemas and tables
+
+To replicate only DML statements, configure two `Binlog event filter rule`, as shown below:
+
+```
+filters:
+  do-table-rule:
+    schema-pattern: "test_*"
+    table-pattern: "t_*"
+    events: ["create table", "all dml"]
+    action: Do
+  do-schema-rule:
+    schema-pattern: "test_*"
+    events: ["create database"]
+    action: Do
+```
+
+### Filter out SQL statements not supported by TiDB
+
+To filter out SQL statements not supported by TiDB, configure a `filter-procedure-rule`, as shown below:
+
+```
+filters:
+  filter-procedure-rule:
+    schema-pattern: "*"
+    sql-pattern: [".*\\s+DROP\\s+PROCEDURE", ".*\\s+CREATE\\s+PROCEDURE", "ALTER\\s+TABLE[\\s\\S]*ADD\\s+PARTITION", "ALTER\\s+TABLE[\\s\\S]*DROP\\s+PARTITION"]
+    action: Ignore
+```
+
+> **Warning:**
+>
+> To avoid filtering out data that needs to be migrated, configure the global filtering rule as strictly as possible.
+
+## See also
+
+[Filter Binlog Events Using SQL Expressions](/filter-dml-event.md)
diff --git a/filter-dml-event.md b/filter-dml-event.md
@@ -0,0 +1,80 @@
+---
+title: Filter DML Events Using SQL Expressions
+summary: Learn how to filter DML events using SQL expressions.
+---
+
+# Filter DML Events Using SQL Expressions
+
+This document introduces how to filter binlog events using SQL expressions when you use DM to perform continuous incremental data replication. For the detailed replication instruction, refer to the following documents:
+
+- [Migrate MySQL of Small Datasets to TiDB](/migrate-small-mysql-to-tidb.md)
+- [Migrate MySQL of Large Datasets to TiDB](/migrate-large-mysql-to-tidb.md)
+- [Migrate and Merge MySQL Shards of Small Datasets to TiDB](/migrate-small-mysql-shards-to-tidb.md)
+- [Migrate and Merge MySQL Shards of Large Datasets to TiDB](/migrate-large-mysql-shards-to-tidb.md)
+
+When performing incremental data replication, you can use the [Binlog Event Filter](/filter-binlog-event.md) to filter certain types of binlog events. For example, you can choose not to replicate `DELETE` events to the downstream for the purposes like archiving and auditing. However, the Binlog Event Filter cannot determine whether to filter the `DELETE` event of a row that requires finer granularity.
+
+To address the issue, since v2.0.5, DM supports using `binlog value filter` in incremental data replication to filter data. Among the DM-supported and `ROW`-formatted binlog, the binlog events carry values of all columns, and you can configure SQL expressions based on these values. If the expression calculates a row change as `TRUE`, DM does not replicate this row change to the downstream.
+
+Similar to [Binlog Event Filter](/filter-binlog-event.md), you need to configure `binlog value filter` in the task configuration file. For details, see the following configuration example. For the advanced task configuration and the description, refer to [DM advanced task configuration file](https://docs.pingcap.com/tidb-data-migration/stable/task-configuration-file-full#task-configuration-file-template-advanced).
+
+```yaml
+name: test
+task-mode: all
+
+mysql-instances:
+  - source-id: "mysql-replica-01"
+    expression-filters: ["even_c"]
+
+expression-filter:
+  even_c:
+    schema: "expr_filter"
+    table: "tbl"
+    insert-value-expr: "c % 2 = 0"
+```
+
+In the above configuration example, the `even_c` rule is configured and referenced by the data source `mysql-replica-01`. According to this rule, for the `tb1` table in the `expr_filter` schema, when an even number is inserted into the `c` column (`c % 2 = 0`), this `insert` statement is not replicated to the downstream. The following example shows the effect of this rule.
+
+Incrementally insert the following data in the upstream data source:
+
+```sql
+INSERT INTO tbl(id, c) VALUES (1, 1), (2, 2), (3, 3), (4, 4);
+```
+
+Then query the `tb1` table on downstream. You can see that only the rows with odd numbers on `c` are replicated.
+
+```sql
+MySQL [test]> select * from tbl;
++------+------+
+| id   | c    |
++------+------+
+|    1 |    1 |
+|    3 |    3 |
++------+------+
+2 rows in set (0.001 sec)
+```
+
+## Configuration parameters and description
+
+- `schema`: The name of the upstream schema to match. Wildcard matching or regular matching is not supported.
+- `table`: The name of the upstream table to match. Wildcard matching or regular matching is not supported.
+- `insert-value-expr`: Configures an expression that takes effect on values carried by the `INSERT` type binlog events (WRITE_ROWS_EVENT). You cannot use this expression together with `update-old-value-expr`, `update-new-value-expr` or `delete-value-expr` in the same configuration item.
+- `update-old-value-expr`: Configures an expression that takes effect on the old values carried by the `UPDATE` type binlog events (UPDATE_ROWS_EVENT). You cannot use this expression together with `insert-value-expr` or `delete-value-expr` in the same configuration item.
+- `update-new-value-expr`: Configures an expression that takes effect on the new values carried by the `UPDATE` type binlog events (UPDATE_ROWS_EVENT). You cannot use this expression together with `insert-value-expr` or `delete-value-expr` in the same configuration item.
+- `delete-value-expr`: Configures an expression that takes effect on values carried by the `DELETE` type binlog events (DELETE_ROWS_EVENT). You cannot use this expression together with `insert-value-expr`, `update-old-value-expr` or `update-new-value-expr`.
+
+> **Note:**
+>
+> - You can configure `update-old-value-expr` and `update-new-value-expr` together.
+> - When `update-old-value-expr` and `update-new-value-expr` are configured together, the rows whose "update + old values" meet `update-old-value-expr` **and** whose "update + new values" meet `update-new-value-expr` are filtered.
+> - When one of `update-old-value-expr` and `update-new-value-expr` is configured, the configured expression determines whether to filter the **entire row change**, which means that the deletion of old values and the insertion of new values are filtered as a whole.
+
+You can use the SQL expression on one column or on multiple columns. You can also use the SQL functions supported by TiDB, such as `c % 2 = 0`, `a*a + b*b = c*c`, and `ts > NOW()`.
+
+The `TIMESTAMP` default time zone is the time zone specified in the task configuration file. The default value is the time zone of the downstream. You can explicitly specify the time zone in a way like `c_timestamp = '2021-01-01 12:34:56.5678+08:00'`.
+
+You can configure multiple filtering rules under the `expression-filter` configuration item. The upstream data source references the required rule in `expression-filters` to make it effective. When multiple rules are used, if **any** one of the rules are matched, the entire row change is filtered.
+
+> **Note:**
+>
+> Configuring too many expression filtering rules increases the calculation overhead of DM and slows down the data replication.
diff --git a/media/migrate-shard-tables-within-1tb-en.png b/media/migrate-shard-tables-within-1tb-en.png
diff --git a/media/shard-merge-using-lightning-en.png b/media/shard-merge-using-lightning-en.png