Skip to content
This repository has been archived by the owner on Feb 6, 2024. It is now read-only.

Shard version mismatch #219

Open
ZuLiangWang opened this issue Jul 19, 2023 · 1 comment
Open

Shard version mismatch #219

ZuLiangWang opened this issue Jul 19, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@ZuLiangWang
Copy link
Contributor

Describe this problem
After the cluster runs for a long time, the ShardVersion of some CeresDB nodes is inconsistent with the ShardVersion of CeresMeta.

Steps to reproduce
The cluster runs for a long time, and it is not clear how to reproduce it.

Expected behavior
The Shard version of CeresDB and CeresMeta are consistent.

Additional Information

2023-07-19T14:46:32.114+0800    error   procedure/manager_impl.go:161   procedure start failed  {"clusterName": "defaultCluster", "error": "send eventPrepare: dispatch create table on shard: create table on shard: (#500)event dispatch failed, cause:create table on shard, addr:11.39.8.171:8831, request:{{{46 1 1111} 1110} {109314 MMM_2198193666_INFLUENCE_PRE_SANDBOX_OUTPUT_TABLE 0 public {<nil>}} [0 10 10 10 4 116 115 105 100 16 5 32 1 10 12 10 6 112 101 114 105 111 100 16 1 32 2 10 23 10 13 103 114 111 117 112 98 121 73 110 100 101 120 48 16 4 24 1 32 3 40 1 10 17 10 9 108 111 103 83 97 109 112 108 101 16 4 24 1 32 4 10 17 10 7 95 114 101 115 117 108 116 16 4 24 1 32 5 40 1 10 16 10 6 115 101 114 118 101 114 16 4 24 1 32 6 40 1 10 13 10 3 105 100 99 16 4 24 1 32 7 40 1 10 13 10 3 108 100 99 16 4 24 1 32 8 40 1 10 21 10 11 97 112 112 108 105 99 97 116 105 111 110 16 4 24 1 32 9 40 1 16 1 24 2 34 2 1 2] Analytic false map[enable_ttl:true ttl:3d update_mode:APPEND write_buffer_size:33554432]}, err:fail to create table on shard in cluster, req:CreateTableOnShardRequest { update_shard_info: Some(UpdateShardInfo { curr_shard_info: Some(ShardInfo { id: 46, role: Leader, version: 1111 }), prev_version: 1110 }), table_info: Some(TableInfo { id: 109314, name: \"MMM_2198193666_INFLUENCE_PRE_SANDBOX_OUTPUT_TABLE\", schema_id: 0, schema_name: \"public\", partition_info: None }), encoded_schema: [0, 10, 10, 10, 4, 116, 115, 105, 100, 16, 5, 32, 1, 10, 12, 10, 6, 112, 101, 114, 105, 111, 100, 16, 1, 32, 2, 10, 23, 10, 13, 103, 114, 111, 117, 112, 98, 121, 73, 110, 100, 101, 120, 48, 16, 4, 24, 1, 32, 3, 40, 1, 10, 17, 10, 9, 108, 111, 103, 83, 97, 109, 112, 108, 101, 16, 4, 24, 1, 32, 4, 10, 17, 10, 7, 95, 114, 101, 115, 117, 108, 116, 16, 4, 24, 1, 32, 5, 40, 1, 10, 16, 10, 6, 115, 101, 114, 118, 101, 114, 16, 4, 24, 1, 32, 6, 40, 1, 10, 13, 10, 3, 105, 100, 99, 16, 4, 24, 1, 32, 7, 40, 1, 10, 13, 10, 3, 108, 100, 99, 16, 4, 24, 1, 32, 8, 40, 1, 10, 21, 10, 11, 97, 112, 112, 108, 105, 99, 97, 116, 105, 111, 110, 16, 4, 24, 1, 32, 9, 40, 1, 16, 1, 24, 2, 34, 2, 1, 2], engine: \"Analytic\", create_if_not_exist: false, options: {\"enable_ttl\": \"true\", \"write_buffer_size\": \"33554432\", \"update_mode\": \"APPEND\", \"ttl\": \"3d\"} }. Caused by: Shard version mismatch, shard_info:ShardInfo { id: 46, role: Leader, version: 1111 }, expect version:1110.\ngithub.com/CeresDB/ceresmeta/pkg/coderr.(*codeError).WithCausef\n\t/Users/zulliangwang/code/ceres/ceresmeta/pkg/coderr/error.go:73\ngithub.com/CeresDB/ceresmeta/server/coordinator/eventdispatch.(*DispatchImpl).CreateTableOnShard\n\t/Users/zulliangwang/code/ceres/ceresmeta/server/coordinator/eventdispatch/dispatch_impl.go:71\ngithub.com/CeresDB/ceresmeta/server/coordinator/procedure/ddl.CreateTableOnShard\n\t/Users/zulliangwang/code/ceres/ceresmeta/server/coordinator/procedure/ddl/common_util.go:60\ngithub.com/CeresDB/ceresmeta/server/coordinator/procedure/ddl/createtable.prepareCallback\n\t/Users/zulliangwang/code/ceres/ceresmeta/server/coordinator/procedure/ddl/createtable/create_table.go:90\ngithub.com/looplab/fsm.(*FSM).afterEventCallbacks\n\t/Users/zulliangwang/go/pkg/mod/github.com/looplab/[email protected]/fsm.go:435\ngithub.com/looplab/fsm.(*FSM).Event.func1\n\t/Users/zulliangwang/go/pkg/mod/github.com/looplab/[email protected]/fsm.go:330\ngithub.com/looplab/fsm.transitionerStruct.transition\n\t/Users/zulliangwang/go/pkg/mod/github.com/looplab/[email protected]/fsm.go:375\ngithub.com/looplab/fsm.(*FSM).doTransition\n\t/Users/zulliangwang/go/pkg/mod/github.com/looplab/[email protected]/fsm.go:360\ngithub.com/looplab/fsm.(*FSM).Event\n\t/Users/zulliangwang/go/pkg/mod/github.com/looplab/[email protected]/fsm.go:343\ngithub.com/CeresDB/ceresmeta/server/coordinator/procedure/ddl/createtable.(*Procedure).Start\n\t/Users/zulliangwang/code/ceres/ceresmeta/server/coordinator/procedure/ddl/createtable/create_table.go:213\ngithub.com/CeresDB/ceresmeta/server/coordinator/procedure.(*ManagerImpl).startProcedureWorker.func1\n\t/Users/zulliangwang/code/ceres/ceresmeta/server/coordinator/procedure/manager_impl.go:159\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594\ngithub.com/CeresDB/ceresmeta/pkg/coderr.(*codeError).WithCausef\n\t/Users/zulliangwang/code/ceres/ceresmeta/pkg/coderr/error.go:73\ngithub.com/CeresDB/ceresmeta/server/coordinator/eventdispatch.(*DispatchImpl).CreateTableOnShard\n\t/Users/zulliangwang/code/ceres/ceresmeta/server/coordinator/eventdispatch/dispatch_impl.go:71\ngithub.com/CeresDB/ceresmeta/server/coordinator/procedure/ddl.CreateTableOnShard\n\t/Users/zulliangwang/code/ceres/ceresmeta/server/coordinator/procedure/ddl/common_util.go:60\ngithub.com/CeresDB/ceresmeta/server/coordinator/procedure/ddl/createtable.prepareCallback\n\t/Users/zulliangwang/code/ceres/ceresmeta/server/coordinator/procedure/ddl/createtable/create_table.go:90\ngithub.com/looplab/fsm.(*FSM).afterEventCallbacks\n\t/Users/zulliangwang/go/pkg/mod/github.com/looplab/[email protected]/fsm.go:435\ngithub.com/looplab/fsm.(*FSM).Event.func1\n\t/Users/zulliangwang/go/pkg/mod/github.com/looplab/[email protected]/fsm.go:330\ngithub.com/looplab/fsm.transitionerStruct.transition\n\t/Users/zulliangwang/go/pkg/mod/github.com/looplab/[email protected]/fsm.go:375\ngithub.com/looplab/fsm.(*FSM).doTransition\n\t/Users/zulliangwang/go/pkg/mod/github.com/looplab/[email protected]/fsm.go:360\ngithub.com/looplab/fsm.(*FSM).Event\n\t/Users/zulliangwang/go/pkg/mod/github.com/looplab/[email protected]/fsm.go:343\ngithub.com/CeresDB/ceresmeta/server/coordinator/procedure/ddl/createtable.(*Procedure).Start\n\t/Users/zulliangwang/code/ceres/ceresmeta/server/coordinator/procedure/ddl/createtable/create_table.go:213\ngithub.com/CeresDB/ceresmeta/server/coordinator/procedure.(*ManagerImpl).startProcedureWorker.func1\n\t/Users/zulliangwang/code/ceres/ceresmeta/server/coordinator/procedure/manager_impl.go:159\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594", "errorVerbose": "(#500)event dispatch failed, cause:create table on shard, addr:11.39.8.171:8831, request:{{{46 1 1111} 1110} {109314 MMM_2198193666_INFLUENCE_PRE_SANDBOX_OUTPUT_TABLE 0 public {<nil>}} [0 10 10 10 4 116 115 105 100 16 5 32 1 10 12 10 6 112 101 114 105 111 100 16 1 32 2 10 23 10 13 103 114 111 117 112 98 121 73 110 100 101 120 48 16 4 24 1 32 3 40 1 10 17 10 9 108 111 103 83 97 109 112 108 101 16 4 24 1 32 4 10 17 10 7 95 114 101 115 117 108 116 16 4 24 1 32 5 40 1 10 16 10 6 115 101 114 118 101 114 16 4 24 1 32 6 40 1 10 13 10 3 105 100 99 16 4 24 1 32 7 40 1 10 13 10 3 108 100 99 16 4 24 1 32 8 40 1 10 21 10 11 97 112 112 108 105 99 97 116 105 111 110 16 4 24 1 32 9 40 1 16 1 24 2 34 2 1 2] Analytic false map[enable_ttl:true ttl:3d update_mode:APPEND write_buffer_size:33554432]}, err:fail to create table on shard in cluster, req:CreateTableOnShardRequest { update_shard_info: Some(UpdateShardInfo { curr_shard_info: Some(ShardInfo { id: 46, role: Leader, version: 1111 }), prev_version: 1110 }), table_info: Some(TableInfo { id: 109314, name: \"MMM_2198193666_INFLUENCE_PRE_SANDBOX_OUTPUT_TABLE\", schema_id: 0, schema_name: \"public\", partition_info: None }), encoded_schema: [0, 10, 10, 10, 4, 116, 115, 105, 100, 16, 5, 32, 1, 10, 12, 10, 6, 112, 101, 114, 105, 111, 100, 16, 1, 32, 2, 10, 23, 10, 13, 103, 114, 111, 117, 112, 98, 121, 73, 110, 100, 101, 120, 48, 16, 4, 24, 1, 32, 3, 40, 1, 10, 17, 10, 9, 108, 111, 103, 83, 97, 109, 112, 108, 101, 16, 4, 24, 1, 32, 4, 10, 17, 10, 7, 95, 114, 101, 115, 117, 108, 116, 16, 4, 24, 1, 32, 5, 40, 1, 10, 16, 10, 6, 115, 101, 114, 118, 101, 114, 16, 4, 24, 1, 32, 6, 40, 1, 10, 13, 10, 3, 105, 100, 99, 16, 4, 24, 1, 32, 7, 40, 1, 10, 13, 10, 3, 108, 100, 99, 16, 4, 24, 1, 32, 8, 40, 1, 10, 21, 10, 11, 97, 112, 112, 108, 105, 99, 97, 116, 105, 111, 110, 16, 4, 24, 1, 32, 9, 40, 1, 16, 1, 24, 2, 34, 2, 1, 2], engine: \"Analytic\", create_if_not_exist: false, options: {\"enable_ttl\": \"true\", \"write_buffer_size\": \"33554432\", \"update_mode\": \"APPEND\", \"ttl\": \"3d\"} }. Caused by: Shard version mismatch, shard_info:ShardInfo { id: 46, role: Leader, version: 1111 }, expect version:1110.\ngithub.com/CeresDB/ceresmeta/pkg/coderr.(*codeError).WithCausef\n\t/Users/zulliangwang/code/ceres/ceresmeta/pkg/coderr/error.go:73\ngithub.com/CeresDB/ceresmeta/server/coordinator/eventdispatch.(*DispatchImpl).CreateTableOnShard\n\t/Users/zulliangwang/code/ceres/ceresmeta/server/coordinator/eventdispatch/dispatch_impl.go:71\ngithub.com/CeresDB/ceresmeta/server/coordinator/procedure/ddl.CreateTableOnShard\n\t/Users/zulliangwang/code/ceres/ceresmeta/server/coordinator/procedure/ddl/common_util.go:60\ngithub.com/CeresDB/ceresmeta/server/coordinator/procedure/ddl/createtable.prepareCallback\n\t/Users/zulliangwang/code/ceres/ceresmeta/server/coordinator/procedure/ddl/createtable/create_table.go:90\ngithub.com/looplab/fsm.(*FSM).afterEventCallbacks\n\t/Users/zulliangwang/go/pkg/mod/github.com/looplab/[email protected]/fsm.go:435\ngithub.com/looplab/fsm.(*FSM).Event.func1\n\t/Users/zulliangwang/go/pkg/mod/github.com/looplab/[email protected]/fsm.go:330\ngithub.com/looplab/fsm.transitionerStruct.transition\n\t/Users/zulliangwang/go/pkg/mod/github.com/looplab/[email protected]/fsm.go:375\ngithub.com/looplab/fsm.(*FSM).doTransition\n\t/Users/zulliangwang/go/pkg/mod/github.com/looplab/[email protected]/fsm.go:360\ngithub.com/looplab/fsm.(*FSM).Event\n\t/Users/zulliangwang/go/pkg/mod/github.com/looplab/[email protected]/fsm.go:343\ngithub.com/CeresDB/ceresmeta/server/coordinator/procedure/ddl/createtable.(*Procedure).Start\n\t/Users/zulliangwang/code/ceres/ceresmeta/server/coordinator/procedure/ddl/createtable/create_table.go:213\ngithub.com/CeresDB/ceresmeta/server/coordinator/procedure.(*ManagerImpl).startProcedureWorker.func1\n\t/Users/zulliangwang/code/ceres/ceresmeta/server/coordinator/procedure/manager_impl.go:159\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594\ngithub.com/CeresDB/ceresmeta/pkg/coderr.(*codeError).WithCausef\n\t/Users/zulliangwang/code/ceres/ceresmeta/pkg/coderr/error.go:73\ngithub.com/CeresDB/ceresmeta/server/coordinator/eventdispatch.(*DispatchImpl).CreateTableOnShard\n\t/Users/zulliangwang/code/ceres/ceresmeta/server/coordinator/eventdispatch/dispatch_impl.go:71\ngithub.com/CeresDB/ceresmeta/server/coordinator/procedure/ddl.CreateTableOnShard\n\t/Users/zulliangwang/code/ceres/ceresmeta/server/coordinator/procedure/ddl/common_util.go:60\ngithub.com/CeresDB/ceresmeta/server/coordinator/procedure/ddl/createtable.prepareCallback\n\t/Users/zulliangwang/code/ceres/ceresmeta/server/coordinator/procedure/ddl/createtable/create_table.go:90\ngithub.com/looplab/fsm.(*FSM).afterEventCallbacks\n\t/Users/zulliangwang/go/pkg/mod/github.com/looplab/[email protected]/fsm.go:435\ngithub.com/looplab/fsm.(*FSM).Event.func1\n\t/Users/zulliangwang/go/pkg/mod/github.com/looplab/[email protected]/fsm.go:330\ngithub.com/looplab/fsm.transitionerStruct.transition\n\t/Users/zulliangwang/go/pkg/mod/github.com/looplab/[email protected]/fsm.go:375\ngithub.com/looplab/fsm.(*FSM).doTransition\n\t/Users/zulliangwang/go/pkg/mod/github.com/looplab/[email protected]/fsm.go:360\ngithub.com/looplab/fsm.(*FSM).Event\n\t/Users/zulliangwang/go/pkg/mod/github.com/looplab/[email protected]/fsm.go:343\ngithub.com/CeresDB/ceresmeta/server/coordinator/procedure/ddl/createtable.(*Procedure).Start\n\t/Users/zulliangwang/code/ceres/ceresmeta/server/coordinator/procedure/ddl/createtable/create_table.go:213\ngithub.com/CeresDB/ceresmeta/server/coordinator/procedure.(*ManagerImpl).startProcedureWorker.func1\n\t/Users/zulliangwang/code/ceres/ceresmeta/server/coordinator/procedure/manager_impl.go:159\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594\ncreate table on shard\ndispatch create table on shard\nsend eventPrepare"}
@ZuLiangWang ZuLiangWang added the bug Something isn't working label Jul 19, 2023
@ShiKaiWi
Copy link
Member

This problem results from that shard version is determined by ceresmeta, but it should be decided by ceresdb. And my proposal is to upgrade the protocol of meta_event_service to support ceresdb to decide the shard version.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants