Skip to content

Commit

Permalink
Update DM TOC and remove redundant files (pingcap#7493)
Browse files Browse the repository at this point in the history
  • Loading branch information
shichun-0415 authored Feb 9, 2022
1 parent 3bf91a5 commit 52d5110
Show file tree
Hide file tree
Showing 16 changed files with 63 additions and 1,230 deletions.
16 changes: 4 additions & 12 deletions TOC.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,9 +52,9 @@
- [Replicate Incremental Data between TiDB Clusters](/incremental-replication-between-clusters.md)
- Advanced Migration
- [Continuous Replication with gh-ost or pt-osc](/migrate-with-pt-ghost.md)
- [Migrate to a Downstream Table with More Columns](/migrate-with-more-columns-downstream.md)
- [Filter Binlog Events](/filter-binlog-event.md)
- [Filter DML Events Using SQL Expressions](/filter-dml-event.md)
- [Migrate to a Downstream Table with More Columns](/migrate-with-more-columns-downstream.md)
- Maintain
- Upgrade
- [Use TiUP (Recommended)](/upgrade-tidb-using-tiup.md)
Expand Down Expand Up @@ -250,19 +250,14 @@
- [Pessimistic Mode](/dm/feature-shard-merge-pessimistic.md)
- [Optimistic Mode](/dm/feature-shard-merge-optimistic.md)
- [Migrate from MySQL Databases that Use GH-ost/PT-osc](/dm/feature-online-ddl.md)
- [Filter Certain Row Changes Using SQL Expressions](/dm/feature-expression-filter.md)
- [Filter DMLs Using SQL Expressions](/dm/feature-expression-filter.md)
- [DM Architecture](/dm/dm-arch.md)
- [Benchmarks](/dm/dm-benchmark-v5.4.0.md)
- Quick Start
- [Quick Start](/dm/quick-start-with-dm.md)
- [Deploy a DM cluster Using TiUP](/dm/deploy-a-dm-cluster-using-tiup.md)
- [Create a Data Source](/dm/quick-start-create-source.md)
- Data Migration Scenarios
- [Data Migration Scenario Overview](/dm/quick-create-migration-task.md)
- [Migrate Data from Multiple Data Sources to TiDB](/dm/usage-scenario-simple-migration.md)
- [Migrate Sharded Schemas and Tables to TiDB](/dm/usage-scenario-shard-merge.md)
- [Migrate Incremental Data to TiDB](/dm/usage-scenario-incremental-migration.md)
- [Migrate Tables when There Are More Columns Downstream](/dm/usage-scenario-downstream-more-columns.md)
- [Data Migration Scenarios](/dm/quick-create-migration-task.md)
- Deploy
- [Software and Hardware Requirements](/dm/dm-hardware-and-software-requirements.md)
- Deploy a DM Cluster
Expand Down Expand Up @@ -291,13 +286,10 @@
- [Export and Import Data Sources and Task Configuration of Clusters](/dm/dm-export-import-config.md)
- [Handle Failed DDL Statements](/dm/handle-failed-ddl-statements.md)
- [Manually Handle Sharding DDL Lock](/dm/manually-handling-sharding-ddl-locks.md)
- [Switch the MySQL Instance to Be Migrated](/dm/usage-scenario-master-slave-switch.md)
- [Manage Schemas of Tables to be Migrated](/dm/dm-manage-schema.md)
- [Handle Alerts](/dm/dm-handle-alerts.md)
- [Daily Check](/dm/dm-daily-check.md)
- Usage Scenarios
- [Migrate from Aurora to TiDB](/dm/migrate-from-mysql-aurora.md)
- [Migrate when TiDB Tables Have More Columns](/dm/usage-scenario-downstream-more-columns.md)
- [Switch the MySQL Instance to Be Migrated](/dm/usage-scenario-master-slave-switch.md)
- Troubleshoot
- [Handle Errors](/dm/dm-error-handling.md)
- [Handle Performance Issues](/dm/dm-handle-performance-issues.md)
Expand Down
89 changes: 6 additions & 83 deletions dm/feature-expression-filter.md
Original file line number Diff line number Diff line change
@@ -1,91 +1,14 @@
---
title: Filter Certain Row Changes Using SQL Expressions
title: Filter DMLs Using SQL Expressions
aliases: ['tidb/dev/feature-expression-filter/']
---

# Filter Certain Row Changes Using SQL Expressions
# Filter DMLs Using SQL Expressions

## Overview

In the process of data migration, DM provides the [Binlog Event Filter](/dm/dm-key-features.md#binlog-event-filter) feature to filter certain types of binlog events. For example, for archiving or auditing purposes, `DELETE` event might be filtered when data is migrated to the downstream. However, Binlog Event Filter cannot judge with a greater granularity whether the `DELETE` event of a certain row should be filtered.
In the process of incremental data migration, you can filter certain types of binlog events using the [Filter Binlog Events](/filter-binlog-event.md) feature. For example, for archiving or auditing purposes, you can filter out `DELETE` events when migrating data to the downstream. However, Binlog Event Filter cannot judge with a greater granularity on whether to filter out a specific row of `DELETE` events.

To solve the above issue, DM supports filtering certain row changes using SQL expressions. The binlog in the `ROW` format supported by DM has the values of all columns in binlog events. You can configure SQL expressions according to these values. If the SQL expressions evaluate a row change as `TRUE`, DM will not migrate the row change downstream.
To solve the above issue, DM supports filtering data during incremental migration using `binlog value filter` since v2.0.5. The binlog in the `ROW` format supported by DM has the values of all columns in binlog events. You can configure SQL expressions according to these values. If the SQL expressions evaluate a row change as `TRUE`, DM will not migrate the row change downstream.

> **Note:**
>
> This feature only takes effect in the phase of incremental replication, not in the phase of full migration.
## Configuration example

Similar to [Binlog Event Filter](/dm/dm-key-features.md#binlog-event-filter), you also need to configure the expression-filter feature in the configuration file of the data migration task, as shown below. For complete configuration and its descriptions, refer to [DM Advanced Task Configuration File](/dm/task-configuration-file-full.md#task-configuration-file-template-advanced)

```yml
name: test
task-mode: all

target-database:
host: "127.0.0.1"
port: 4000
user: "root"
password: ""

mysql-instances:
- source-id: "mysql-replica-01"
expression-filters: ["even_c"]

expression-filter:
even_c:
schema: "expr_filter"
table: "tbl"
insert-value-expr: "c % 2 = 0"
```
The above example configures `even_c` rule, and allows the data source whose ID is `mysql-replica-01` to refer this rule. The meaning of `even_c` is:

For the `tbl` table in the `expr_filter` shema, when the value of the inserted `c` is even (`c % 2 = 0`), the inserted statement will not be migrated downstream.

The usage result of this rule is shown below.

Insert the following data in the upstream data source:

```sql
INSERT INTO tbl(id, c) VALUES (1, 1), (2, 2), (3, 3), (4, 4);
```

Then query the `tbl` table downstream and you can find that only rows with an odd value of `c` are migrated downstream:

```sql
MySQL [test]> select * from tbl;
+------+------+
| id | c |
+------+------+
| 1 | 1 |
| 3 | 3 |
+------+------+
2 rows in set (0.001 sec)
```

## Configuration parameters and rule descriptions

- `schema`: The name of the upstream database to be matched. Wildcard match or regular match is not supported.
- `table`: The name of the upstream table to be matched. Wildcard match or regular match is not supported.
- `insert-value-expr`: Specifies an expression which takes effect on the value of binlog event (WRITE_ROWS_EVENT) of INSERT type. Do not use it with `update-old-value-expr`, `update-new-value-expr`, or `delete-value-expr` in the same configuration item.
- `update-old-value-expr`:Specifies an expression which takes effect on the old value of binlog event (UPDATE_ROWS_EVENT) of UPDATE type. Do not use it with `insert-value-expr` or `delete-value-expr` in the same configuration item.
- `update-new-value-expr`: Specifies an expression which takes effect on the new value of binlog event (UPDATE_ROWS_EVENT) of UPDATE type. Do not use it with `insert-value-expr` or `delete-value-expr` in the same configuration item.
- `delete-value-expr`:Specifies an expression which takes effect on the value of binlog event (DELETE_ROWS_EVENT) of DELETE type. Do not use it with`insert-value-expr`, `update-old-value-expr`, or `update-new-value-expr` in the same configuration item.

> **Note:**
>
> You can configure `update-old-value-expr` and `update-new-value-expr` at the same time.
>
> - When you configure `update-old-value-expr` and `update-new-value-expr` at the same time, the row changes where updated old value meets the rule of `update-old-value-expr` **and** the updated new value meets the rule of `update-new-value-expr` will be filtered out.
> - When you only configure one parameter, the statement you configure will decide whether to filter **the whole row changes**, which means the delete event of an old value and the insert event of a new value will be filtered out as a whole.

SQL expressions can involve one or more columns. You can also use the SQL functions TiDB supports, such as `c % 2 = 0`, `a*a + b*b = c*c`, and `ts > NOW()`.

The timezone of TIMESTAMP is UTC by default. You can use `c_timestamp = '2021-01-01 12:34:56.5678+08:00'` to specify the timezone explicitly.

You can define multiple filter rules under the configuration item `expression-filter`. By refering the rules you need in the configuration item of `expression-filters` in the upstream data source, the rules can take effect. When multiple rules take effect, matching **any** of the rules causes a row change to be filtered.

> **Note:**
>
> Setting too many expression filters for a table increases the computing overhead of DM, which might impede data migration.
For detailed operation and implementation, see [Filter DML Events Using SQL Expressions](/filter-dml-event.md).
61 changes: 2 additions & 59 deletions dm/feature-online-ddl.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,66 +6,9 @@ aliases: ['/docs/tidb-data-migration/dev/online-ddl-scheme/','tidb-data-migratio

# Migrate from Databases that Use GH-ost/PT-osc

This document introduces the `online-ddl` feature of DM when DM is used to migrate data from MySQL to TiDB and how online DDL tools perform during the data migration process.
In production scenarios, table locking during DDL execution can block the reads from or writes to the database to a certain extent. Therefore, online DDL tools are often used to execute DDLs to minimize the impact on reads and writes. Common DDL tools are [gh-ost](https://github.com/github/gh-ost) and [pt-osc](https://www.percona.com/doc/percona-toolkit/3.0/pt-online-schema-change.html).

## Overview

DDL statements are always used in the database applications. MySQL 5.6 and later versions support `online-ddl` feature, but there are limitations for usage. For example, to acquire the MDL lock, some DDLs still need to be copied. In production scenarios, the table lock during DDL execution can block the reads or writes to and from the database to a certain extent.

Therefore, online DDL tools are often used to execute DDLs to reduce the impact on reads and writes. Common DDL tools are [gh-ost](https://github.com/github/gh-ost) and [pt-osc](https://www.percona.com/doc/percona-toolkit/3.0/pt-online-schema-change.html).

Generally, these tools work by the following steps:

1. Create a new ghost table according to the table schema of the DDL real table;
2. Apply DDLs on the ghost table;
3. Replicate the data of the DDL real table to the ghost table;
4. After the data are consistent between the two tables, use the `rename` statement to replace the real table with the ghost table.

![DM online-ddl](/media/dm/dm-online-ddl-2.png)

When you migrate data from MySQL to TiDB using DM, online DDL tools can identify the DDLs in the above step 2 and apply them downstream in step 4, which can reduce the replication workload for the ghost table.

## `online-ddl` Configuration

Generally, it is recommended to enbale the `online-ddl` configuration and you can see the following effects:

![DM online-ddl](/media/dm/dm-online-ddl.png)

- The downstream TiDB does not need to create and replicate the ghost table, saving the storage space and network transmission overhead;
- When you merge and migrate data from sharded tables, the RENAME operation is ignored for each sharded ghost tables to ensure the correctness of the replication;
- Currently, one limitation for DM is that DMLs in this task are blocked until DDL operation is finished when you apply DDL operation to the downstream TiDB. This limitation will be removed later.

> **Note:**
>
> If you need to disable the `online-ddl` configuration, pay attention to the following effects:
>
> - The downstream TiDB replicates the behaviors of online DDL tools like gh-ost/pt-osc;
> - You need to manually add various temporary tables and ghost tables generated by the online DDL tools to the task configuration white list;
> - You cannot merge and migrate data from sharded tables.
## Configuration

In the task configuration file, `online-ddl` is at the same level of `name`. For example:

```yml
# ----------- Global configuration -----------
## ********* Basic configuration *********
name: test # The name of the task. Should be globally unique.
task-mode: all # The task mode. Can be set to `full`/`incremental`/`all`.
shard-mode: "pessimistic" # The shard merge mode. Optional modes are ""/"pessimistic"/"optimistic". The "" mode is used by default which means sharding DDL merge is disabled. If the task is a shard merge task, set it to the "pessimistic" mode. After understanding the principles and restrictions of the "optimistic" mode, you can set it to the "optimistic" mode.
meta-schema: "dm_meta" # The downstream database that stores the `meta` information.
online-ddl: true # Supports automatic processing of "gh-ost" and "pt" for the upstream database.
online-ddl-scheme: "gh-ost" # `online-ddl-scheme` will be deprecated in the future, so it is recommended to use `online-ddl`.
target-database: # Configuration of the downstream database instance.
host: "192.168.0.1"
port: 4000
user: "root"
password: "" # It is recommended to use password encrypted with dmctl if the password is not empty.
```
For the advanced configuration and the description of each configuration parameter, refer to [DM advanced task configuration file template](/dm/task-configuration-file-full.md#task-configuration-file-template-advanced).
When you merge and migrate data from sharded tables, you need to coordinate the DDL of each sharded table, and the DML before and after the DDL. DM supports two different modes: pessimistic mode and optimistic mode. For the differences and scenarios between the two modes, refer to [Merge and Migrate Data from Sharded Tables](/dm/feature-shard-merge.md).
When using DM to migrate data from MySQL to TiDB, you can enbale online-ddl to allow collaboration of DM and gh-ost or pt-osc. For details about how to enable online-ddl and the workflow after enabling this option, see [Continuous Replication with gh-ost or pt-osc](/migrate-with-pt-ghost.md). This document focuses on the collaboration details of DM and online DDL tools.

## Working details for DM with online DDL tools

Expand Down
Loading

0 comments on commit 52d5110

Please sign in to comment.