Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mdcr] connect second cluster to the existing one (with data) #1205

Open
posledov opened this issue Jan 10, 2022 · 1 comment
Open

[mdcr] connect second cluster to the existing one (with data) #1205

posledov opened this issue Jan 10, 2022 · 1 comment

Comments

@posledov
Copy link

posledov commented Jan 10, 2022

Hello.

I need to connect an empty cluster in another DC to an existing cluster that already has data.

DC1 cluster info

# /usr/local/bin/leofs-adm status
 [System Confiuration]
-----------------------------------+----------
 Item                              | Value
-----------------------------------+----------
 Basic/Consistency level
-----------------------------------+----------
                    system version | 1.4.3
                        cluster Id | leofs_1
                             DC Id | dc_1
                    Total replicas | 1
          number of successes of R | 1
          number of successes of W | 1
          number of successes of D | 1
 number of rack-awareness replicas | 0
                         ring size | 2^128
-----------------------------------+----------
 Multi DC replication settings
-----------------------------------+----------
 [mdcr] max number of joinable DCs | 2
 [mdcr] total replicas per a DC    | 1
 [mdcr] number of successes of R   | 1
 [mdcr] number of successes of W   | 1
 [mdcr] number of successes of D   | 1
-----------------------------------+----------
 Manager RING hash
-----------------------------------+----------
                 current ring-hash | 80897912
                previous ring-hash | 80897912
-----------------------------------+----------

 [State of Node(s)]
-------+------------------------------+--------------+---------+----------------+----------------+----------------------------
 type  |             node             |    state     | rack id |  current ring  |   prev ring    |          updated at
-------+------------------------------+--------------+---------+----------------+----------------+----------------------------
  S    | [email protected]      | running      |         | 80897912       | 80897912       | 2022-01-10 14:39:27 +0200
  G    | [email protected]      | running      |         | 80897912       | 80897912       | 2022-01-10 11:34:47 +0200
-------+------------------------------+--------------+---------+----------------+----------------+----------------------------


# /usr/local/bin/leofs-adm du [email protected]
 active number of objects: 9786
  total number of objects: 9802
   active size of objects: 1222737081
    total size of objects: 1262702978
     ratio of active size: 96.83%
    last compaction start: 2022-01-10 16:07:23 +0200
      last compaction end: 2022-01-10 16:07:29 +0200

DC2 cluster info

# /usr/local/bin/leofs-adm status
 [System Confiuration]
-----------------------------------+----------
 Item                              | Value
-----------------------------------+----------
 Basic/Consistency level
-----------------------------------+----------
                    system version | 1.4.3
                        cluster Id | leofs_2
                             DC Id | dc_2
                    Total replicas | 2
          number of successes of R | 1
          number of successes of W | 1
          number of successes of D | 1
 number of rack-awareness replicas | 2
                         ring size | 2^128
-----------------------------------+----------
 Multi DC replication settings
-----------------------------------+----------
 [mdcr] max number of joinable DCs | 2
 [mdcr] total replicas per a DC    | 1
 [mdcr] number of successes of R   | 1
 [mdcr] number of successes of W   | 1
 [mdcr] number of successes of D   | 1
-----------------------------------+----------
 Manager RING hash
-----------------------------------+----------
                 current ring-hash | 84eb107d
                previous ring-hash | 84eb107d
-----------------------------------+----------

 [State of Node(s)]
-------+------------------------------+--------------+-----------+----------------+----------------+----------------------------
 type  |             node             |    state     |  rack id  |  current ring  |   prev ring    |          updated at
-------+------------------------------+--------------+-----------+----------------+----------------+----------------------------
  S    | [email protected]      | running      | R7        | 84eb107d       | 84eb107d       | 2022-01-10 15:53:13 +0200
  S    | [email protected]      | running      | R8        | 84eb107d       | 84eb107d       | 2022-01-10 15:54:15 +0200
  G    | [email protected]      | running      |           | 84eb107d       | 84eb107d       | 2022-01-10 12:18:13 +0200
  G    | [email protected]      | running      |           | 84eb107d       | 84eb107d       | 2022-01-10 12:18:20 +0200
-------+------------------------------+--------------+-----------+----------------+----------------+----------------------------

join-cluster

# /usr/local/bin/leofs-adm join-cluster [email protected]:13075 [email protected]:13076
OK

/usr/local/leofs/1.4.3/leo_storage/log/app/crash.log

After join-cluster command such errors appears in the crash.log on the [email protected]

{module,"leo_backend_db_eleveldb"},{function,"prefix_search/3"},{line,227},{body,{timeout,{gen_server,call,[leo_object_storage_read_0_1,{head,{29834374738833832619322004778813394310,<<"b2b-cache/cities/04/noparams.xml">>},7007419},30000]}}}
2022-01-10 16:37:12 =ERROR REPORT====
{module,"leo_backend_db_eleveldb"},{function,"prefix_search/3"},{line,227},{body,{timeout,{gen_server,call,[leo_object_storage_read_1_2,{head,{49036670905450747481373050418517138571,<<"b2b-cache/cities/05/noparams.xml">>},7037644},30000]}}}
2022-01-10 16:37:12 =ERROR REPORT====
{module,"leo_backend_db_eleveldb"},{function,"prefix_search/3"},{line,227},{body,{timeout,{gen_server,call,[leo_object_storage_read_0_1,{head,{29834374738833832619322004778813394310,<<"b2b-cache/cities/04/noparams.xml">>},7037659},30000]}}}
2022-01-10 16:37:42 =ERROR REPORT====
{module,"leo_backend_db_eleveldb"},{function,"prefix_search/3"},{line,227},{body,{timeout,{gen_server,call,[leo_object_storage_read_1_2,{head,{49036670905450747481373050418517138571,<<"b2b-cache/cities/05/noparams.xml">>},7067928},30000]}}}
2022-01-10 16:37:42 =ERROR REPORT====
{module,"leo_backend_db_eleveldb"},{function,"prefix_search/3"},{line,227},{body,{timeout,{gen_server,call,[leo_object_storage_read_2_1,{head,{200780644758158633844541877214146315548,<<"b2b-cache/cities/07/noparams.json">>},7067944},30000]}}}
2022-01-10 16:37:42 =ERROR REPORT====
{module,"leo_backend_db_eleveldb"},{function,"prefix_search/3"},{line,227},{body,{timeout,{gen_server,call,[leo_object_storage_read_0_1,{head,{29834374738833832619322004778813394310,<<"b2b-cache/cities/04/noparams.xml">>},7067952},30000]}}}

and DC1 cluster is getting very slow:

 # s3cmd --config=/opt/s3cmd/b2b.cfg ls s3://b2b-cache/
WARNING: Retrying failed request: /?delimiter=%2F (500 (InternalError): We encountered an internal error. Please try again.)
WARNING: Waiting 3 sec...
WARNING: Retrying failed request: /?delimiter=%2F (500 (InternalError): We encountered an internal error. Please try again.)
WARNING: Waiting 6 sec...
…
…

mq-stats

# /usr/local/bin/leofs-adm mq-stats [email protected]
              id                |       state       | number of msgs | batch of msgs  |    interval    |                                 description
--------------------------------+-------------------+----------------|----------------|----------------|-------------------------------------------------------------------------
 leo_async_deletion_queue       |      idling       | 0              | 1600           | 500            | requests of removing objects asynchronously
 leo_comp_meta_with_dc_queue    |      idling       | 0              | 1600           | 500            | requests of comparing metadata w/remote-node
 leo_delete_dir_queue_1         |      idling       | 0              | 1600           | 500            | requests of removing buckets #1
 leo_delete_dir_queue_2         |      idling       | 0              | 1600           | 500            | requests of removing buckets #2
 leo_delete_dir_queue_3         |      idling       | 0              | 1600           | 500            | requests of removing buckets #3
 leo_delete_dir_queue_4         |      idling       | 0              | 1600           | 500            | requests of removing buckets #4
 leo_delete_dir_queue_5         |      idling       | 0              | 1600           | 500            | requests of removing buckets #5
 leo_delete_dir_queue_6         |      idling       | 0              | 1600           | 500            | requests of removing buckets #6
 leo_delete_dir_queue_7         |      idling       | 0              | 1600           | 500            | requests of removing buckets #7
 leo_delete_dir_queue_8         |      idling       | 0              | 1600           | 500            | requests of removing buckets #8
 leo_per_object_queue           |      idling       | 0              | 1600           | 500            | requests of fixing inconsistency of objects
 leo_rebalance_queue            |      idling       | 0              | 1600           | 500            | requests of relocating objects
 leo_recovery_node_queue        |      idling       | 0              | 1600           | 500            | requests of recovering objects of the node (incl. recover-consistency)
 leo_req_delete_dir_queue       |      idling       | 0              | 1600           | 500            | requests of removing directories
 leo_sync_by_vnode_id_queue     |      idling       | 0              | 1600           | 500            | requests of synchronizing objects by vnode-id
 leo_sync_obj_with_dc_queue     |      idling       | 0              | 1600           | 500            | requests of synchronizing objects w/remote-node

I can see same users and buckets on the DC2, but endpoints and buckets data from DC1 does not replicate to the DC2 and there is no any warn/errors in the DC2 log-files.

After systemctl restart leofs-storage.service at [email protected] DC1 cluster is getting normal work speed:

# systemctl restart leofs-storage.service

# time s3cmd --config=/opt/s3cmd/b2b.cfg ls s3://b2b-cache/
                          DIR  s3://b2b-cache/01/
                          DIR  s3://b2b-cache/cities/
                          DIR  s3://b2b-cache/warehouses/

real	0m0.256s
user	0m0.161s
sys	0m0.050s

We have no any "firewalling" entities between DC1<->DC2.

Please help!

@posledov
Copy link
Author

@yosukehara Can you help, please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant