- Support listening to both MySQL & MongoDB & DRDS of Aliyun (https://www.aliyun.com/product/drds)
- If Syncer fail to connect to input data source, will abort
- MySQL master source filter:
- Schema filter (naming as
repos
), support regex- If a table match multiple schemas & table (because the usage of regex), an error message will be logged and syncer will use anyone that match filter column
- Table name filter
- Column name filter
- If a change event go through column filter, and only primary key is left:
- If change event type is
UPDATE
, then discard this change event -- because not support update id now; - Other change event type, keep it.
- In a
UPDATE
, all interested column will be received even no change (different fromMongoDB
) - automatic primary key detection and set into
SyncData#id
- Support reading from binlog file to do data recovering in case of loss of data (
input[x].file
) - Support specify binlog file/position to start reading (
input[x].connection.syncMeta[]
)
- Schema filter (naming as
- MongoDB master source filter:
- Version: 3.x, 4.0
- Only 4.0 support field removed detection and sync (Because the limitation of ES/MySQL, it always means setting field to null in output target which may not what you want)
- Database filter (naming as
repos
), support regex- If a change event match multiple schemas & table, we will use the first match (config file order) to filter/output,
i.e. the specific
repo
config will override the regexrepo
config
- If a change event match multiple schemas & table, we will use the first match (config file order) to filter/output,
i.e. the specific
- Collection name filter
- Column name filter
- If a change event go through column filter, and only primary key is left:
- If change event type is
UPDATE
, then discard this change event -- because not support update id now; - Other change event type, keep it.
- In a
UPDATE
, only changed column will be received (different fromMySQL
) - automatic
_id
detection and set intoSyncData#id
- If you config user/password for Mongo auth, it should have permission of
[listDatabases, find]
- Only support listening first level field (Because MongoDB store json, it may have multiple levels)
- Version: 3.x, 4.0
- DRDS:
- Same config as MySQL, but need to connect directly to RDS's MySQL because DRDS not support binlog dump
- Remember to fetch partition key in
fields
- Remember where we leave last time by writing file/position of binlog/oplog, and resume from there so as to avoid any data loss
- More than once (at-least-once): we can ensure the at least once semantics now, so you need to make sure your output channel (the
consumer
of syncer output) is idempotent and your destination can handle it without dup. Counterexample: a table without primary key definitely can't handle it and cause duplicate data soon or later.
- More than once (at-least-once): we can ensure the at least once semantics now, so you need to make sure your output channel (the
- Multiple consumer can share a common connection to same data source, i.e. MySQL/MongoDB, to reduce the burden of remote master
- Automatically skip synced item for consumers according to register info
After data items come out from Input
module, it is converted to SyncData
(s) -- the abstraction of
a single data change. In other words, a single binlog item may contain multiple line change and convert
to multiple SyncData
s.
Manipulate SyncData
via :
sourcePath
: write a java class to handleSyncData
(for more details, see filter part of Consumer Pipeline Config)- or skip it if no action needed
-
If output channel meet too many failure/error (exceeds
countLimit
), it will abort and change health tored
-
If fail to connect to output channel, will retry every 2**n seconds
-
Elasticsearch
- Version: 5.x
- Bulk operation
- Update/Delete documents by
UpdateByQuery
orDeleteByQuery
- Join/merge documents from different source when push to ES1
- ExtraQuery: do extra query to fetch extra needed info
- One to many relationships (parent-child relationship in ES)for document in different index
- Self-referential relationship handle
- Add
upsert
support, fixDocumentMissingException
useupsert
, can be used in following two scenarios- Init load for data, by creating index manually and update synced field to ES (only support
MySQL
input) - Fix some un-expected config/sync error
- Init load for data, by creating index manually and update synced field to ES (only support
- No need for other code for search data preparation
-
MySQL
- Version: 5.5, 5.6, 5.7, 8.0
- Bulk operation
- Simple nested sql:
insert into select
- Ignore
DuplicateKeyException
, not count as failure - Low latency
-
Kafka
- Version: 0.10.0 or later
- Bulk operation
- Using
id
of data source askey
of record, making sure the orders between records - Using
SyncResult
as msgdata
- Json serializer/deserializer
- Notice: Kafka msg consumer has to handle change event idempotent;
- Notice: May in disorder if error happen;
- Easy to re-consume, rebuild without affect biz db;
-
HBase
[1]: Be careful about this feature, it may affect your performance
-
Http Endpoints
- Port decision:
- If no port config,
Syncer
will try ports between[40000, 40010)
- If port is configured via either command line or env var
port
orport
inconfig.yml
syncer will use that port - If port is configured in multiple locations: command line, env var and config file, the precedence will be
- command line option
- env var
- file config
- If no port config,
http://ip:port/health
: reportSyncer
status dynamically;
- Port decision:
-
JMX Endpoints
- Use
jconsole
to connect toSyncer
, you can change the logging level dynamically; (Or change log level by--debug
option when start)
- Use
-
MySQL:
- Don't change the numeric suffix naming of binlog, or it will fail the voting of binlog
- Supported version: depend on this binlog connector lib
- Not support composite primary key
- Not support update primary key
- If you have extraQuery, only support update/delete by query exact value, i.e. no support query analyzed field (
text
query when update) - Data of numeric types (tinyint, etc) always returned signed regardless of whether column definition includes "unsigned" keyword or not.
You may need to convert to unsigned if necessary.
- If your output is MySQL, Syncer will handle this situation for you in new binlog connector
Byte.toUnsignedInt((byte)(int) fields['xx']) // or SyncUtil.unsignedByte(sync, "xx");
- data of
*text
/*blob
types always returned as a byte array (forvar*
this is true in future version). You may need to convert to string if necessary.- If your output is MySQL, Syncer handle this situation for you.
new String(fields['xx']) // or SyncUtil.toStr(sync, "xx");
-
Mongo:
- Not delete field from ES if sync to ES
- Driver client compatibility
- For version 4.0 and later (Use change stream):
- Storage Engine: WiredTiger
- Replica Set Protocol Version: The replica sets and sharded clusters must use replica set protocol version 1 (pv1).
- Read Concern “majority” Enabled.
-
ES
- Don't update/delete use
syncer
and other way (REST api or Java api) at the same time, it may cause version conflict and fail the change - Update/Delete-by-query will be executed at once, i.e. will not be buffered or use batch
- Don't update/delete use