[mdcr] connect second cluster to the existing one (with data) #1205

posledov · 2022-01-10T15:09:11Z

Hello.

I need to connect an empty cluster in another DC to an existing cluster that already has data.

DC1 cluster info

# /usr/local/bin/leofs-adm status
 [System Confiuration]
-----------------------------------+----------
 Item                              | Value
-----------------------------------+----------
 Basic/Consistency level
-----------------------------------+----------
                    system version | 1.4.3
                        cluster Id | leofs_1
                             DC Id | dc_1
                    Total replicas | 1
          number of successes of R | 1
          number of successes of W | 1
          number of successes of D | 1
 number of rack-awareness replicas | 0
                         ring size | 2^128
-----------------------------------+----------
 Multi DC replication settings
-----------------------------------+----------
 [mdcr] max number of joinable DCs | 2
 [mdcr] total replicas per a DC    | 1
 [mdcr] number of successes of R   | 1
 [mdcr] number of successes of W   | 1
 [mdcr] number of successes of D   | 1
-----------------------------------+----------
 Manager RING hash
-----------------------------------+----------
                 current ring-hash | 80897912
                previous ring-hash | 80897912
-----------------------------------+----------

 [State of Node(s)]
-------+------------------------------+--------------+---------+----------------+----------------+----------------------------
 type  |             node             |    state     | rack id |  current ring  |   prev ring    |          updated at
-------+------------------------------+--------------+---------+----------------+----------------+----------------------------
  S    | [email protected]      | running      |         | 80897912       | 80897912       | 2022-01-10 14:39:27 +0200
  G    | [email protected]      | running      |         | 80897912       | 80897912       | 2022-01-10 11:34:47 +0200
-------+------------------------------+--------------+---------+----------------+----------------+----------------------------


# /usr/local/bin/leofs-adm du [email protected]
 active number of objects: 9786
  total number of objects: 9802
   active size of objects: 1222737081
    total size of objects: 1262702978
     ratio of active size: 96.83%
    last compaction start: 2022-01-10 16:07:23 +0200
      last compaction end: 2022-01-10 16:07:29 +0200

DC2 cluster info

# /usr/local/bin/leofs-adm status
 [System Confiuration]
-----------------------------------+----------
 Item                              | Value
-----------------------------------+----------
 Basic/Consistency level
-----------------------------------+----------
                    system version | 1.4.3
                        cluster Id | leofs_2
                             DC Id | dc_2
                    Total replicas | 2
          number of successes of R | 1
          number of successes of W | 1
          number of successes of D | 1
 number of rack-awareness replicas | 2
                         ring size | 2^128
-----------------------------------+----------
 Multi DC replication settings
-----------------------------------+----------
 [mdcr] max number of joinable DCs | 2
 [mdcr] total replicas per a DC    | 1
 [mdcr] number of successes of R   | 1
 [mdcr] number of successes of W   | 1
 [mdcr] number of successes of D   | 1
-----------------------------------+----------
 Manager RING hash
-----------------------------------+----------
                 current ring-hash | 84eb107d
                previous ring-hash | 84eb107d
-----------------------------------+----------

 [State of Node(s)]
-------+------------------------------+--------------+-----------+----------------+----------------+----------------------------
 type  |             node             |    state     |  rack id  |  current ring  |   prev ring    |          updated at
-------+------------------------------+--------------+-----------+----------------+----------------+----------------------------
  S    | [email protected]      | running      | R7        | 84eb107d       | 84eb107d       | 2022-01-10 15:53:13 +0200
  S    | [email protected]      | running      | R8        | 84eb107d       | 84eb107d       | 2022-01-10 15:54:15 +0200
  G    | [email protected]      | running      |           | 84eb107d       | 84eb107d       | 2022-01-10 12:18:13 +0200
  G    | [email protected]      | running      |           | 84eb107d       | 84eb107d       | 2022-01-10 12:18:20 +0200
-------+------------------------------+--------------+-----------+----------------+----------------+----------------------------

join-cluster

# /usr/local/bin/leofs-adm join-cluster [email protected]:13075 [email protected]:13076
OK

/usr/local/leofs/1.4.3/leo_storage/log/app/crash.log

After join-cluster command such errors appears in the crash.log on the [email protected]

{module,"leo_backend_db_eleveldb"},{function,"prefix_search/3"},{line,227},{body,{timeout,{gen_server,call,[leo_object_storage_read_0_1,{head,{29834374738833832619322004778813394310,<<"b2b-cache/cities/04/noparams.xml">>},7007419},30000]}}}
2022-01-10 16:37:12 =ERROR REPORT====
{module,"leo_backend_db_eleveldb"},{function,"prefix_search/3"},{line,227},{body,{timeout,{gen_server,call,[leo_object_storage_read_1_2,{head,{49036670905450747481373050418517138571,<<"b2b-cache/cities/05/noparams.xml">>},7037644},30000]}}}
2022-01-10 16:37:12 =ERROR REPORT====
{module,"leo_backend_db_eleveldb"},{function,"prefix_search/3"},{line,227},{body,{timeout,{gen_server,call,[leo_object_storage_read_0_1,{head,{29834374738833832619322004778813394310,<<"b2b-cache/cities/04/noparams.xml">>},7037659},30000]}}}
2022-01-10 16:37:42 =ERROR REPORT====
{module,"leo_backend_db_eleveldb"},{function,"prefix_search/3"},{line,227},{body,{timeout,{gen_server,call,[leo_object_storage_read_1_2,{head,{49036670905450747481373050418517138571,<<"b2b-cache/cities/05/noparams.xml">>},7067928},30000]}}}
2022-01-10 16:37:42 =ERROR REPORT====
{module,"leo_backend_db_eleveldb"},{function,"prefix_search/3"},{line,227},{body,{timeout,{gen_server,call,[leo_object_storage_read_2_1,{head,{200780644758158633844541877214146315548,<<"b2b-cache/cities/07/noparams.json">>},7067944},30000]}}}
2022-01-10 16:37:42 =ERROR REPORT====
{module,"leo_backend_db_eleveldb"},{function,"prefix_search/3"},{line,227},{body,{timeout,{gen_server,call,[leo_object_storage_read_0_1,{head,{29834374738833832619322004778813394310,<<"b2b-cache/cities/04/noparams.xml">>},7067952},30000]}}}

and DC1 cluster is getting very slow:

 # s3cmd --config=/opt/s3cmd/b2b.cfg ls s3://b2b-cache/
WARNING: Retrying failed request: /?delimiter=%2F (500 (InternalError): We encountered an internal error. Please try again.)
WARNING: Waiting 3 sec...
WARNING: Retrying failed request: /?delimiter=%2F (500 (InternalError): We encountered an internal error. Please try again.)
WARNING: Waiting 6 sec...
…
…

`mq-stats`

# /usr/local/bin/leofs-adm mq-stats [email protected]
              id                |       state       | number of msgs | batch of msgs  |    interval    |                                 description
--------------------------------+-------------------+----------------|----------------|----------------|-------------------------------------------------------------------------
 leo_async_deletion_queue       |      idling       | 0              | 1600           | 500            | requests of removing objects asynchronously
 leo_comp_meta_with_dc_queue    |      idling       | 0              | 1600           | 500            | requests of comparing metadata w/remote-node
 leo_delete_dir_queue_1         |      idling       | 0              | 1600           | 500            | requests of removing buckets #1
 leo_delete_dir_queue_2         |      idling       | 0              | 1600           | 500            | requests of removing buckets #2
 leo_delete_dir_queue_3         |      idling       | 0              | 1600           | 500            | requests of removing buckets #3
 leo_delete_dir_queue_4         |      idling       | 0              | 1600           | 500            | requests of removing buckets #4
 leo_delete_dir_queue_5         |      idling       | 0              | 1600           | 500            | requests of removing buckets #5
 leo_delete_dir_queue_6         |      idling       | 0              | 1600           | 500            | requests of removing buckets #6
 leo_delete_dir_queue_7         |      idling       | 0              | 1600           | 500            | requests of removing buckets #7
 leo_delete_dir_queue_8         |      idling       | 0              | 1600           | 500            | requests of removing buckets #8
 leo_per_object_queue           |      idling       | 0              | 1600           | 500            | requests of fixing inconsistency of objects
 leo_rebalance_queue            |      idling       | 0              | 1600           | 500            | requests of relocating objects
 leo_recovery_node_queue        |      idling       | 0              | 1600           | 500            | requests of recovering objects of the node (incl. recover-consistency)
 leo_req_delete_dir_queue       |      idling       | 0              | 1600           | 500            | requests of removing directories
 leo_sync_by_vnode_id_queue     |      idling       | 0              | 1600           | 500            | requests of synchronizing objects by vnode-id
 leo_sync_obj_with_dc_queue     |      idling       | 0              | 1600           | 500            | requests of synchronizing objects w/remote-node

I can see same users and buckets on the DC2, but endpoints and buckets data from DC1 does not replicate to the DC2 and there is no any warn/errors in the DC2 log-files.

After systemctl restart leofs-storage.service at [email protected] DC1 cluster is getting normal work speed:

# systemctl restart leofs-storage.service

# time s3cmd --config=/opt/s3cmd/b2b.cfg ls s3://b2b-cache/
                          DIR  s3://b2b-cache/01/
                          DIR  s3://b2b-cache/cities/
                          DIR  s3://b2b-cache/warehouses/

real	0m0.256s
user	0m0.161s
sys	0m0.050s

We have no any "firewalling" entities between DC1<->DC2.

Please help!

The text was updated successfully, but these errors were encountered:

posledov · 2022-01-17T13:28:58Z

@yosukehara Can you help, please?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[mdcr] connect second cluster to the existing one (with data) #1205

[mdcr] connect second cluster to the existing one (with data) #1205

posledov commented Jan 10, 2022 •

edited

Loading

posledov commented Jan 17, 2022

[mdcr] connect second cluster to the existing one (with data) #1205

[mdcr] connect second cluster to the existing one (with data) #1205

Comments

posledov commented Jan 10, 2022 • edited Loading

DC1 cluster info

DC2 cluster info

join-cluster

/usr/local/leofs/1.4.3/leo_storage/log/app/crash.log

mq-stats

posledov commented Jan 17, 2022

posledov commented Jan 10, 2022 •

edited

Loading

`mq-stats`