Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YARN-11578. Cache fs supports chmod in LogAggregationFileController. #6120

Merged
merged 3 commits into from
Oct 2, 2023

Conversation

tomicooler
Copy link
Contributor

@tomicooler tomicooler commented Sep 27, 2023

The check introduced in YARN-10901 to avoid a warn message in NN logs in certain situations (when /tmp/logs is not owned by the yarn user), but it adds 3 NameNode calls (create, setpermission, delete) during log aggregation collection, for every NM.
Meaning, when a YARN job completes, at the YARN log aggregation phase this check is done for every job, from every NodeManager.

In 30 minutes 4.2 % of all the NameNode calls were due to this in a cluster. "write" calls need a Namesystem writeLock as well, so the impact is bigger.

Change-Id: I65468aa972860d3b62050fcb41b8b06e417ee8bb

Description of PR

Added a static concurrent cache that maps the <FS class type + Log Path> to the check result.

Assumptions:

  • the permissions won't change while the NMs are running
  • the key <FS class + Log Path> won't grow big
  • <FS class Type + Log Path> is enough for the key. I don't want to keep a FileSystem object in the cache, but if there is a use case where different FileSystem objects are created with the same Class type and would have different permission for the same file path, then this key is not enough.

If these assumptions are not met, we might need to come up with a different idea.

How was this patch tested?

Updated the unit tests to verify that caching works.

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'YARN-11578. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

The check introduced in YARN-10901 to avoid a warn message in NN logs in
certain situations (when /tmp/logs is not owned by the yarn user),
but it adds 3 NameNode calls (create, setpermission, delete) during
log aggregation collection, for every NM.
Meaning, when a YARN job completes, at the YARN log aggregation
phase this check is done for every job, from every NodeManager.

In 30 minutes 4.2 % of all the NameNode calls were due to this in
a cluster. "write" calls need a Namesystem writeLock as well,
so the impact is bigger.

Change-Id: I65468aa972860d3b62050fcb41b8b06e417ee8bb
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 50s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 52m 52s trunk passed
+1 💚 compile 0m 45s trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 compile 0m 40s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 checkstyle 0m 36s trunk passed
+1 💚 mvnsite 0m 46s trunk passed
+1 💚 javadoc 0m 54s trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 0m 46s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 1m 45s trunk passed
+1 💚 shadedclient 38m 40s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 36s the patch passed
+1 💚 compile 0m 40s the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 javac 0m 40s the patch passed
+1 💚 compile 0m 34s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 javac 0m 34s the patch passed
+1 💚 blanks 0m 1s The patch has no blanks issues.
+1 💚 checkstyle 0m 26s the patch passed
+1 💚 mvnsite 0m 37s the patch passed
+1 💚 javadoc 0m 41s the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 0m 40s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 1m 42s the patch passed
+1 💚 shadedclient 38m 52s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 5m 22s hadoop-yarn-common in the patch passed.
+1 💚 asflicense 0m 34s The patch does not generate ASF License warnings.
152m 18s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6120/1/artifact/out/Dockerfile
GITHUB PR #6120
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 4298a1771681 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / ab182cd
Default Java Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6120/1/testReport/
Max. process+thread count 529 (vs. ulimit of 5500)
modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6120/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Change-Id: I0fc25b90662fbec29e8be056db55d1a1c970e4d4
@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 49s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 48m 57s trunk passed
+1 💚 compile 0m 45s trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 compile 0m 39s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 checkstyle 0m 36s trunk passed
+1 💚 mvnsite 0m 45s trunk passed
+1 💚 javadoc 0m 54s trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 0m 46s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 1m 46s trunk passed
+1 💚 shadedclient 38m 36s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 36s the patch passed
+1 💚 compile 0m 39s the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 javac 0m 39s the patch passed
+1 💚 compile 0m 33s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 javac 0m 33s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 26s the patch passed
+1 💚 mvnsite 0m 36s the patch passed
+1 💚 javadoc 0m 41s the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 0m 40s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 1m 44s the patch passed
+1 💚 shadedclient 39m 10s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 5m 23s hadoop-yarn-common in the patch passed.
+1 💚 asflicense 0m 34s The patch does not generate ASF License warnings.
148m 21s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6120/2/artifact/out/Dockerfile
GITHUB PR #6120
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 29e059263727 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / c5f4c67
Default Java Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6120/2/testReport/
Max. process+thread count 594 (vs. ulimit of 5500)
modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6120/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

} catch (UnsupportedOperationException use) {
LOG.info("Unable to set permissions for configured filesystem since"
+ " it does not support this {}", remoteFS.getScheme());
fsSupportsChmod = false;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for reviewers. Prior to this change, fsSupportsChmod was only set to false when the UnsupportedOperationException was thrown, otherwise fsSupportsChmod was not modified (by default it was set to true).

Now fsSupportsChmod can be updated (set to true when everything succeeds) or false when anything fails. Alternatively we could set it to false only when UnsupportedOperationException is thrown and set it to true only when everything succeeds otherwise keeping it as-is.

Or keep the original behaviour.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the naming and the context I think we can go ahead with the modified behaviour: only set it to true when everything succeeds.

LOG.info("Unable to set permissions for configured filesystem since"
+ " it does not support this {}", remoteFS.getScheme());
} catch (IOException e) {
LOG.warn("Failed to check if FileSystem supports permissions on "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the log we should use {}.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. Fixed it.

Change-Id: I6a2852172b66b3b50f36765b0ba9dcb62fa97081
@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 29s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 33m 15s trunk passed
+1 💚 compile 0m 34s trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 compile 0m 32s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 checkstyle 0m 30s trunk passed
+1 💚 mvnsite 0m 36s trunk passed
+1 💚 javadoc 0m 42s trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 0m 37s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 1m 12s trunk passed
+1 💚 shadedclient 21m 31s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 26s the patch passed
+1 💚 compile 0m 28s the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 javac 0m 28s the patch passed
+1 💚 compile 0m 25s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 javac 0m 25s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 18s the patch passed
+1 💚 mvnsite 0m 27s the patch passed
+1 💚 javadoc 0m 29s the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 0m 29s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 1m 4s the patch passed
+1 💚 shadedclient 21m 6s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 4m 39s hadoop-yarn-common in the patch passed.
+1 💚 asflicense 0m 28s The patch does not generate ASF License warnings.
92m 26s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6120/3/artifact/out/Dockerfile
GITHUB PR #6120
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 9b2948fff24c 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 0e0b977
Default Java Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6120/3/testReport/
Max. process+thread count 684 (vs. ulimit of 5500)
modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6120/3/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@brumi1024
Copy link
Member

brumi1024 commented Oct 2, 2023

Thanks @tomicooler for the patch, LGTM. @slfan1989 thanks for the review, do you have anything, or are you ok with this being merged?

@brumi1024 brumi1024 merged commit a04a9e1 into apache:trunk Oct 2, 2023
@brumi1024
Copy link
Member

Additionally, @tomicooler can you please do the backports for 3.3 and 3.2 branches?

tomicooler added a commit to tomicooler/hadoop that referenced this pull request Oct 2, 2023
…pache#6120)

Change-Id: Ib02eb32aaca799a9a23ec7c38f0d5a0578fe5c80


# Conflicts:
#	hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/LogAggregationFileController.java
#	hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/filecontroller/TestLogAggregationFileController.java
tomicooler added a commit to tomicooler/hadoop that referenced this pull request Oct 2, 2023
@tomicooler
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants