-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple writer for the same sst caused by close shard
#990
Comments
After #998, the updates following the closing shard will be forbidden. However, some ssts may be still being written when close the shard, while these ssts may share the same ids with the new ssts created by the new node, leading to the multiple writers on the same sst. Let's fix this problem in another PR. @baojinri |
## Rationale Part of #990. Some background jobs are still allowed to execute, and it will lead to data corrupted when a table is migrated between different nodes because of multiple writers for the same table. ## Detailed Changes Introduce a flag called `invalid` in the table data to denote whether the serial executor is valid, and this flag is protected with the `TableOpSerialExecutor` in table data, and the `TableOpSerialExecutor` won't be acquired if the flag is set, that is to say, any table operation including updating manifest, altering table and so on, can't be executed after the flag is set because these operations require the `TableOpSerialExecutor`. Finally, the flag will be set when the table is closed.
Part of apache#990. Some background jobs are still allowed to execute, and it will lead to data corrupted when a table is migrated between different nodes because of multiple writers for the same table. Introduce a flag called `invalid` in the table data to denote whether the serial executor is valid, and this flag is protected with the `TableOpSerialExecutor` in table data, and the `TableOpSerialExecutor` won't be acquired if the flag is set, that is to say, any table operation including updating manifest, altering table and so on, can't be executed after the flag is set because these operations require the `TableOpSerialExecutor`. Finally, the flag will be set when the table is closed.
## Rationale part of #990 ## Detailed Changes - Abstract a `IdAllocator` in `common_util` library - Use `IdAllocator` to alloc sst id - Persist `IdAllocator`'s `max id` to manifest ## Test Plan - Add new unit test for `IdAllocator` - Manual compatibility test: make ssts generated with old ceresdb-server, and deploy a new ceresdb-server with this changeset. Write some new data into it to make new ssts generated, and check whether the sst id is correct.
#1009 has fixed the problem. However, #998 actually didn't achieve the goal to prevent updates after table is closed. And #998 has been reverted, I guess I'll submit another change set to make all things work. |
## Rationale Part of apache#990. Some background jobs are still allowed to execute, and it will lead to data corrupted when a table is migrated between different nodes because of multiple writers for the same table. ## Detailed Changes Introduce a flag called `invalid` in the table data to denote whether the serial executor is valid, and this flag is protected with the `TableOpSerialExecutor` in table data, and the `TableOpSerialExecutor` won't be acquired if the flag is set, that is to say, any table operation including updating manifest, altering table and so on, can't be executed after the flag is set because these operations require the `TableOpSerialExecutor`. Finally, the flag will be set when the table is closed.
…pache#1034) This reverts commit 85eb0b7. ## Rationale The changes introduced by apache#998 are not reasonable. Another fix will address apache#990. ## Detailed Changes Revert apache#998
Describe this problem
Shard will be moved from nodes when process panic because if for any reason, all operations related to such a shard should be stopped before moving(especially the
write operations
).However background works(flush, compaction, all of them are
writes
) will not be stoppend rightly before now. That caused a serious bug : multiple writers for one sst.Server version
CeresDB Server
Version: 1.2.2
Git commit: 2e20665
Git branch: main
Opt level: 3
Rustc version: 1.69.0-nightly
Target: aarch64-apple-darwin
Build date: 2023-06-12T13:01:03.592984000Z
Steps to reproduce
Hard to reproduce, if must do this, steps following may can work:
Expected behavior
No response
Additional Information
No response
The text was updated successfully, but these errors were encountered: