-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDFS-17496. DataNode supports more fine-grained dataset lock based on blockid. #6764
base: trunk
Are you sure you want to change the base?
Conversation
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
@hfutatzhanghb |
@BsoBird Sir, thanks for noticing this PR. This PR aims to reduce the two-level dataset lock competition furtherly, it is inside datanode, we do not know hdfs filesystem inode information here. Hope to receive your response~ |
@Hexiaoqiao @ZanderXu @haiyang1987 @tomscut Sir, What's your opinions? Hope receive your response~ Thanks a lot. |
💔 -1 overall
This message was automatically generated. |
1892c8d
to
89ca2cd
Compare
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
@hfutatzhanghb Thanks for your works and involve me here. Please check report from Yetus and fix it first. Will review it later. Thanks again. |
Sir, Thanks a lot for your reminding. Have fix failed UTs, Wait Yetus done. |
@Hexiaoqiao Sir, it seems that something wrong when building this PR. Could you please check it when you have free time ? Thank a lot ~ |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @hfutatzhanghb for your works. Leave some comments inline, PFYI.
...ect/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataSetLockManager.java
Show resolved
Hide resolved
* @param blockId blockId | ||
* @return The two-level subdir name | ||
*/ | ||
public static String idToBlockDirSuffixName(long blockId) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's different between idToBlockDir
and idToBlockDirSuffixName
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idToBlockDirSuffixName method is used to generate SubDir lock name, just like BlockPool id or storageuuid.
DataStorage.BLOCK_SUBDIR_PREFIX + d2; | ||
} | ||
|
||
public static List<String> getAllSubDirNameForDataSetLock() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't get what this method want to do, and why there are double layer loop (include the invoke enter) and user 0x1F as end condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is used to generate all subdir lock name. We use 0x1F as end condition because the layout of DataNode.
Each blockpool dir under one volume has two layer subdir, like subdir1/subdir31.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When DataNode starts, we will initial all dataset subdir lock in advance. Every volume has its subdir dataset locks.
In current layout of datanode's storage directory, we have 32 multipy 32 subdirs in each volume.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is good idea to follow the current directory layout to construct lock based blockid. since:
a. We have to keep same logic between the directory layout and block/subdir lock, right? However it implements at different place without any constraint, which will involve some cost for the future improvement when change the directory layout.
b. I think we could decouple block/subdir lock from directory layout, define another rule for block/subdir lock with blockid only(such as id mod 1000
, split one volume lock to 1000 locks is enough to improve the performance), which I think it make sense to block/subdir.
What do you think about? Thanks.
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
bpid)) { | ||
try (AutoCloseableLock lock = lockManager.readLock(LockLevel.DIR, | ||
bpid, getReplicaInfo(bpid, blkid).getStorageUuid(), | ||
datasetSubLockStrategy.blockIdToSubLock(blkid))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi! sir, This query op should be sufficient to obtain a 'POOL' lock, why change it to a 'DIR' lock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@huangzhaobo99 Sir, thanks for your noticing. The DIR lock is used here to maintain the same mutual exclusion semantics as before. We should protect the block operations rightly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hfutatzhanghb Thanks, Got it
0c8fe9f
to
9eb153c
Compare
💔 -1 overall
This message was automatically generated. |
Hi @hfutatzhanghb Here I left one comment #6764 (comment) , what do you think about? Thanks. |
Sir, thanks for reviewing, very good suggestions even though we don't often change datanode layout. I have fixed the codes according to your advice. Please help review StrategyInterface |
🎊 +1 overall
This message was automatically generated. |
Description of PR
Refer to HDFS-17496.
This PR has two benifits:
1、improve datanode's throughout when using faster disk like SSD.
2、improve datanode's throughout when datanode has large disk capacity, this PR can mitigate volume lock contention when ioutil is consistently high.
We used NvmeSSD as volumes in datanodes and performed some stress tests.
We found that NvmeSSD and HDD disks achieve similar performance when create lots of small files, such as 10KB.
This phenomenon is counterintuitive. After analyzing the metric monitoring , we found that fsdataset lock became the bottleneck in high concurrency scenario.
Currently, we have two level locks which are BLOCK_POOL and VOLUME. We can further split the volume lock to DIR lock.
DIR lock is defined as below: given a blockid, we can determine which subdir this block will be placed in finalized dir. We just use subdir[0-31]/subdir[0-31] as the name of DIR lock.
More details, please refer to method DatanodeUtil#idToBlockDir:
The performance comparison is as below:
experimental setup:
3 DataNodes with single disk.
10 Cients concurrent write and delete files after writing.
550 threads per Client.
specially, evictBlocks, onCompleteLazyPersist, updateReplicaUnderRecovery we don't modify.