-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDFS-17210. Optimize AvailableSpaceBlockPlacementPolicy. #6113
HDFS-17210. Optimize AvailableSpaceBlockPlacementPolicy. #6113
Conversation
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch here. Please fix the code style and keep the style as usual, Thanks.
cc @zhangshuyan0 Would you mind to give another review?
Thanks @Hexiaoqiao for your timely review ,will fix code style later~ |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you want to reduce the probability of nodes with extremely high storage usage being selected, which is very useful. But the current changes are confusing me a bit, so I would like to discuss it with you.
The meaning of toleranceLimit
is not very intuitive to me. The logic here couples the effect of balancedSpaceToleranceLimit
to the configuration value of balancedSpaceTolerance
. For example, if we configure balancedSpaceTolerance
to 10, nodeA's usage is 99%, and nodeB's usage is 90%, then even if balancedSpaceToleranceLimit
is configured to 93, nodeA will still be selected without any doubt. I think this is not what we expected.
I suggest that we can directly use nodes with greater usage to make judgments. At the same time, we can make the meaning of toleranceLimit
more intuitive and no longer negate it when using it. The changes are as follows:
line215:
boolean toleranceLimit = Math.max(a.getDfsUsedPercent(), b.getDfsUsedPercent()) < balancedSpaceToleranceLimit;
Line 217:
if (a.equals(b) || (toleranceLimit && Math.abs(a.getDfsUsedPercent() - b.getDfsUsedPercent()) < balancedSpaceTolerance) || ((isBalanceLocal && a.getDfsUsedPercent() < 50)))
Please feel free to disagree. Looking forward to your reply.
Thanks @zhangshuyan0 for you review, It seems using min(A,B) only cover when all node's usage are over balancedSpaceToleranceLimit, which does not fit for lower node usage and higher node usage, so your suggest changes makes sense for me. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
ping ~@Hexiaoqiao @zhangshuyan0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late response since I took a long vacation. Back to this PR, it mostly looks good to me. Will +1 when fix the checkstyle.
@@ -1240,6 +1240,12 @@ public class DFSConfigKeys extends CommonConfigurationKeys { | |||
public static final int | |||
DFS_NAMENODE_AVAILABLE_SPACE_BLOCK_PLACEMENT_POLICY_BALANCED_SPACE_TOLERANCE_DEFAULT = | |||
5; | |||
public static final String | |||
DFS_NAMENODE_AVAILABLE_SPACE_BLOCK_PLACEMENT_POLICY_BALANCED_SPACE_TOLERANCE_LIMIT_KEY = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Align as other configuration item.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @Hexiaoqiao ,i kept the Align as other configuration item and fixed the checkstyle items. but DFS_NAMENODE_AVAILABLE_SPACE_BLOCK_PLACEMENT_POLICY_BALANCED_SPACE_TOLERANCE_LIMIT_DEFAULT
is still a bit long(over 100 per line), do you have better ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is OK for me now. The root cause is name of this constant is too long, but I also have no other choice to be better. Maybe we could care that when name next time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, thanks @Hexiaoqiao for your review~
@@ -71,6 +76,11 @@ public void initialize(Configuration conf, FSClusterStats stats, | |||
DFS_NAMENODE_AVAILABLE_SPACE_BLOCK_PLACEMENT_POLICY_BALANCED_SPACE_TOLERANCE_KEY, | |||
DFS_NAMENODE_AVAILABLE_SPACE_BLOCK_PLACEMENT_POLICY_BALANCED_SPACE_TOLERANCE_DEFAULT); | |||
|
|||
balancedSpaceToleranceLimit = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest to check if this value is between 0 and 100. Otherwise, it could work abnormally.
Collections.addAll(allTolerateNodes, tolerateDataNodes); | ||
final BlockManager bm = namenode.getNamesystem().getBlockManager(); | ||
AvailableSpaceBlockPlacementPolicy toleratePlacementPolicy = | ||
(AvailableSpaceBlockPlacementPolicy)bm.getBlockPlacementPolicy(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest to keep the alignment as usual. You should add the 'checkstyle.xml' to your IDE or follow this guide docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Hexiaoqiao , will fix the checkstyle soon~
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. +1 from my side. The failed unit tests seem not related to this PR.
@GuoPhilipse Would you mind fixing checkstyle based on the CI result? It seems that there are three lines with a length exceeding 100. |
Thanks @zhangshuyan0 for your notice, It seems the variable name a bit long (over 100), so the checkstyle always complains. do you have better ideas for this case? |
I just didn't notice that the reason was that the variable name was too long. That's it then. |
Committed to trunk. Thanks for your contribution @GuoPhilipse @Hexiaoqiao. |
Thanks @zhangshuyan0 @Hexiaoqiao for your review~ |
…. Contributed by GuoPhilipse. Reviewed-by: He Xiaoqiao <[email protected]> Signed-off-by: Shuyan Zhang <[email protected]>
Description of PR
for now ,we may have many nodes usage over 85%, some nodes'usage have been over 90%, if we choose three nodes as nodeA-97%,nodeB-98%,nodeC-99%, actually i don't want nodeC(99%) be choosen, for nodeC usage is really high , even random chance ,nodeC will reach 100% soon, so we can directly choose the less usage node(nodeA) if all nodes's usage are over 95%(just a example)
How was this patch tested?
test case added
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?