-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDFS-16993. Datanode supports configure TopN DatanodeNetworkCounts #5597
base: trunk
Are you sure you want to change the base?
Conversation
💔 -1 overall
This message was automatically generated. |
43ecf3b
to
8f316af
Compare
force push to fix checkstyle problems. |
have triggered the build again, if the build is green will merge this today |
💔 -1 overall
This message was automatically generated. |
8f316af
to
e42f4b0
Compare
fixed failed UT |
e42f4b0
to
c55a3ee
Compare
|
||
<property> | ||
<name>dfs.datanode.networkerrors.display.topcount</name> | ||
<value>10</value> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by default it should be -1, and if it is -1 the behaviour should be like the previous one, like today
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks sir, have fixed.
c55a3ee
to
92ae96a
Compare
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dropped some comments.
extend a UT to cover the changes.
First case: the default case
Second: When some value is configured try with may be with two cases: 1 and 2
...-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
Outdated
Show resolved
Hide resolved
...-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
Outdated
Show resolved
Hide resolved
...-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
Outdated
Show resolved
Hide resolved
...-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
Show resolved
Hide resolved
hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
Outdated
Show resolved
Hide resolved
@ayushtkn , Sir, i am sorry for respond too late, i encounted a problem when writing unit test, because the datanode in local is all 127.0.0.1:xxxx. I will fix this soonly when i have time and could you give me some suggestions about this problems? Thanks sir. |
@hfutatzhanghb try writing a minimal test, even if it fails due to same ip, I can try help fixing it from there |
@ayushtkn Thanx sir. Will write an unit test soonly. |
50bc34d
to
87e61a4
Compare
@ayushtkn Hi, sir. I have updated two unit tests, please take a look when you have free time. Thanks a lot. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
a1cca0e
to
9008daf
Compare
💔 -1 overall
This message was automatically generated. |
@ayushtkn Sir, the failed unit tests seems relate to |
💔 -1 overall
This message was automatically generated. |
have updated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
@ayushtkn Sir, could you please help me review this PR again when have free time? Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of creating two separate classes for the unit test, we can just put the unit tests in TestDataNodeMetrics
.
@tomscut Sir, thanks a lot for your opinion, have try that way before, but met some problems. |
cf2e62d
to
be26bde
Compare
🎊 +1 overall
This message was automatically generated. |
@ayushtkn @Hexiaoqiao @tomscut Hi, sir. I think we should better push this PR forward. Because it waste datanode's memory. As the below screeshot show , it has 27K networkError key values!!! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this PR is very useful. Can it be merged into the main branch.
@@ -2630,6 +2631,28 @@ public int getActiveTransferThreadCount() { | |||
|
|||
@Override // DataNodeMXBean | |||
public Map<String, Map<String, Long>> getDatanodeNetworkCounts() { | |||
int maxDisplay = getConf().getInt(DFSConfigKeys.DFS_DATANODE_NETWORKERRORS_DISPLAY_TOPCOUNT, | |||
DFSConfigKeys.DFS_DATANODE_NETWORKERRORS_DISPLAY_TOPCOUNT_DEFAULT); | |||
if (maxDisplay >= 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we first determine the size of the map? If it is less than N, we can return it directly.
be26bde
to
ecd02f0
Compare
💔 -1 overall
This message was automatically generated. |
ecd02f0
to
41aaa00
Compare
💔 -1 overall
This message was automatically generated. |
76446a4
to
ff69bce
Compare
ff69bce
to
3a9ad91
Compare
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
Description of PR
In our prod environment, we try to collect datanode metrics every 15s through jmx_exporter. we found the datanodenetworkerror metric generates a lot.
for example, if we have a cluster with 1000 datanodes, every datanode may generate 999 or more(from client) datanodenetworkerror metrics, and overall datanodes will generate 1000 multiple 999 = 999000 metrics. This is a very expensive operation. In most scenarios, we only need the topN of it.
How was this patch tested?
no need extra UT, TestDataNodeMetrics#testTimeoutMetric has done it.