Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEZ-4569: SCATTER_GATHER + BROADCAST hangs on DAG Recovery #361

Merged
merged 5 commits into from
Dec 23, 2024

Conversation

okumin
Copy link
Contributor

@okumin okumin commented Jun 12, 2024

Let an AM correctly restore its state and restart tasks.
https://issues.apache.org/jira/browse/TEZ-4569

@tez-yetus

This comment was marked as outdated.

@tez-yetus

This comment was marked as outdated.

@okumin okumin changed the title [WIP] TEZ-4569: SCATTER_GATHER + BROADCAST hangs on DAG Recovery TEZ-4569: SCATTER_GATHER + BROADCAST hangs on DAG Recovery Aug 23, 2024
@okumin
Copy link
Contributor Author

okumin commented Dec 23, 2024

I rebased this branch and also added two cosmetic changes.

@abstractdog abstractdog self-requested a review December 23, 2024 11:26
Copy link
Contributor

@abstractdog abstractdog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1
pending tests

@tez-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 28m 3s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ master Compile Tests _
+0 🆗 mvndep 2m 58s Maven dependency ordering for branch
+1 💚 mvninstall 10m 56s master passed
+1 💚 compile 1m 24s master passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu122.04
+1 💚 compile 1m 9s master passed with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu222.04-ga
+1 💚 checkstyle 1m 16s master passed
+1 💚 javadoc 0m 51s master passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu122.04
+1 💚 javadoc 0m 41s master passed with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu222.04-ga
+0 🆗 spotbugs 1m 3s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 2m 57s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 10s Maven dependency ordering for patch
+1 💚 mvninstall 0m 51s the patch passed
+1 💚 compile 0m 55s the patch passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu122.04
+1 💚 javac 0m 55s the patch passed
+1 💚 compile 0m 48s the patch passed with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu222.04-ga
+1 💚 javac 0m 48s the patch passed
-0 ⚠️ checkstyle 0m 11s tez-tests: The patch generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 javadoc 0m 19s the patch passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu122.04
+1 💚 javadoc 0m 19s the patch passed with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu222.04-ga
+1 💚 findbugs 2m 0s the patch passed
_ Other Tests _
-1 ❌ unit 5m 2s tez-dag in the patch failed.
+1 💚 unit 43m 19s tez-tests in the patch passed.
+1 💚 asflicense 0m 21s The patch does not generate ASF License warnings.
106m 23s
Reason Tests
Failed junit tests tez.dag.app.dag.impl.TestDAGRecovery
Subsystem Report/Notes
Docker ClientAPI=1.47 ServerAPI=1.47 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-361/3/artifact/out/Dockerfile
GITHUB PR #361
JIRA Issue TEZ-4569
Optional Tests dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile
uname Linux 1c2c562e42b5 5.15.0-124-generic #134-Ubuntu SMP Fri Sep 27 20:20:17 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/tez.sh
git revision master / 9efa6f1
Default Java Private Build-1.8.0_432-8u432-gaus1-0ubuntu222.04-ga
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu122.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_432-8u432-gaus1-0ubuntu222.04-ga
checkstyle https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-361/3/artifact/out/diff-checkstyle-tez-tests.txt
unit https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-361/3/artifact/out/patch-unit-tez-dag.txt
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-361/3/testReport/
Max. process+thread count 1134 (vs. ulimit of 5500)
modules C: tez-dag tez-tests U: .
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-361/3/console
versions git=2.34.1 maven=3.6.3 findbugs=3.0.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@okumin
Copy link
Contributor Author

okumin commented Dec 23, 2024

I'm checking why TestDAGRecovery failed

@abstractdog
Copy link
Contributor

I'm checking why TestDAGRecovery failed

sometimes recovery tests are flaky, I simply restarted the last precommit job

@okumin
Copy link
Contributor Author

okumin commented Dec 23, 2024

@abstractdog Sorry, my hand-made refactoring included a mistake. I copy-pasted the method names used in the original condition. I appreciate it if you could double-check it.
bdece70

@abstractdog
Copy link
Contributor

@abstractdog Sorry, my hand-made refactoring included a mistake. I copy-pasted the method names used in the original condition. I appreciate it if you could double-check it. bdece70

missed it, I saw it's fixed, glad to see that unit tests revealed the problem
+1 still holds if tests will pass

@tez-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 15s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ master Compile Tests _
+0 🆗 mvndep 2m 48s Maven dependency ordering for branch
+1 💚 mvninstall 13m 59s master passed
+1 💚 compile 1m 19s master passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu122.04
+1 💚 compile 1m 11s master passed with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu222.04-ga
+1 💚 checkstyle 1m 31s master passed
+1 💚 javadoc 0m 57s master passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu122.04
+1 💚 javadoc 0m 42s master passed with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu222.04-ga
+0 🆗 spotbugs 0m 50s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 2m 37s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 10s Maven dependency ordering for patch
+1 💚 mvninstall 0m 48s the patch passed
+1 💚 compile 0m 51s the patch passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu122.04
+1 💚 javac 0m 51s the patch passed
+1 💚 compile 0m 44s the patch passed with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu222.04-ga
+1 💚 javac 0m 44s the patch passed
-0 ⚠️ checkstyle 0m 12s tez-tests: The patch generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 javadoc 0m 19s the patch passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu122.04
+1 💚 javadoc 0m 19s the patch passed with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu222.04-ga
+1 💚 findbugs 2m 0s the patch passed
_ Other Tests _
+1 💚 unit 5m 15s tez-dag in the patch passed.
+1 💚 unit 46m 32s tez-tests in the patch passed.
+1 💚 asflicense 0m 22s The patch does not generate ASF License warnings.
84m 40s
Subsystem Report/Notes
Docker ClientAPI=1.47 ServerAPI=1.47 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-361/5/artifact/out/Dockerfile
GITHUB PR #361
JIRA Issue TEZ-4569
Optional Tests dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile
uname Linux 7e9468acddb5 5.15.0-124-generic #134-Ubuntu SMP Fri Sep 27 20:20:17 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/tez.sh
git revision master / 9efa6f1
Default Java Private Build-1.8.0_432-8u432-gaus1-0ubuntu222.04-ga
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu122.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_432-8u432-gaus1-0ubuntu222.04-ga
checkstyle https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-361/5/artifact/out/diff-checkstyle-tez-tests.txt
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-361/5/testReport/
Max. process+thread count 1187 (vs. ulimit of 5500)
modules C: tez-dag tez-tests U: .
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-361/5/console
versions git=2.34.1 maven=3.6.3 findbugs=3.0.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@abstractdog abstractdog merged commit 44c4f1e into apache:master Dec 23, 2024
4 checks passed
@okumin okumin deleted the TEZ-4569-hang branch December 24, 2024 01:16
@okumin
Copy link
Contributor Author

okumin commented Dec 24, 2024

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants