-
Notifications
You must be signed in to change notification settings - Fork 864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
configury: harden IGNORE_TKR check #12681
configury: harden IGNORE_TKR check #12681
Conversation
NVIDIA HPC Compiler (e.g. nvfortran) incorrectly selected the GCC style pragmas to support IGNORE_TKR. Harden the test by mimicking exactly the mpi f08 bindings in order to fix that false positive. Thanks Chris Parrot for the report. Refs. open-mpi#11582 Signed-off-by: Gilles Gouaillardet <[email protected]>
@cparrott73 sorry it took so long for me to find the right way to approach this problem. |
@cparrott73 Can you confirm that this fixes the issue for you? |
I'll try to give this a go today or Monday and verify. Thanks for doing this! |
I pulled top-of-tree from git and tested it with our compilers. I confirm that it now works correctly. How easy would it be to patch Open MPI 4.1.x with this change? Our product management team is not ready to move forward to 5.x, but I'd like to see if it would be possible to backport this to the version we are currently using. If not, not the end of the world. |
thanks for testing it! the backport was almost straightforward. diff --git a/config/ompi_fortran_check_ignore_tkr.m4 b/config/ompi_fortran_check_ignore_tkr.m4
index d996537609..766e697478 100644
--- a/config/ompi_fortran_check_ignore_tkr.m4
+++ b/config/ompi_fortran_check_ignore_tkr.m4
@@ -189,8 +189,23 @@ AC_DEFUN([OMPI_FORTRAN_CHECK_IGNORE_TKR_SUB], [
call foo(a, count)
end subroutine force_assumed_shape
+ module check_ignore_tkr
+ interface foobar
+ subroutine foobar_x(buffer, count)
+ $1 buffer
+ $2, intent(in) :: buffer
+ integer, intent(in) :: count
+ end subroutine foobar_x
+ end interface
+ end module
+
+ subroutine bar(var)
+ use check_ignore_tkr
+ implicit none
+ real, intent(inout) :: var(:, :, :)
+
+ call foobar(var(1,1,1), 1)
! Autoconf puts "end" after the last line
- subroutine bogus
]]),
[msg=yes
ompi_fortran_ignore_tkr_predecl="$1" |
Thanks! I will use this in our builds. We really appreciate the fix. |
@ggouaillardet Want to merge / cherry-pick to v5.0.x / cherry-pick/port to v4.1.x? |
@jsquyres yes, I will do that from now! |
Apologies, I may have spoken a little too soon here. I tried applying the patch, regenerating ./configure, and running a build. I did not see any errors, and compiling a test program worked. When I went to integrate this into our builds, our scripts complained about some missing F08 files in the lib directory. I took a look, and none of the F08 libraries had built. So I did some deeper investigation. I think the missing F08 libraries were due to a bug in the version of autoconf we were using, which caused ./configure to be regenerated incorrectly, so I tried on another system. However, when I did that, I still did not see any difference in behavior from before. It turns out our compiler is complaining about the source file in question while running ./configure:
So I tried the previous version (as shipped with Open MPI 4.1.5) - and nvfortran throws the same error. I'm not sure if this is expected. gcc-14.1.0 also complains about this file for the same reasons. So I'm not sure this is actually working. Very sorry for not catching this sooner. Just in case - am I missing anything here? Let me know if you need more information from me. Thanks! I have attached the updated configure test as foo2.f90 to the top of this comment. |
@cparrott73 something does not check out here. is there any chance you applied the patch on top of a previous one? Anyway, I issued #12707 for the I can build the FWIW:
|
@ggouaillardet - thanks for the response. I do see that 4.1.x branch patch referenced in #12707 is different than the one I was using, which was a cut and paste from the issue page. I'll pull this file from #12707 and try again, and will let you know. Thanks! |
@ggouaillardet - I pulled top-of-tree from git, built with our compilers, and then ran some tests that were previously known to fail. I can confirm they pass now. I'm not sure why the ompi-4.x version isn't working here yet, but I will continue to run a few experiments on that front. |
@ggouaillardet - I pulled the diff from the pull request this time and applied it to our openmpi-4.x tree here. I can confirm it worked this time. I'm going to try again on our older build machine and see if I can reproduce the success there. I think my earlier failure was from when I tried to cut and paste the diff above into a patch. I think the actual commit is different? Anyway, I think this is good now. Thanks! |
@ggouaillardet - I hate to keep bumping this, but we have found an issue with this change and wanted to get your thoughts on it. The change works as expected for nvfortran. However, we have some developers internally who are also contributing to flang (specifically, flang-new present in LLVM 17 and later). We ran this through flang-new, and it is throwing the following error on the updated version of this code:
I asked our flang developers about this, and here is what they told me:
Thoughts? If I change Does this change what you are trying to test here in any way? If not, I would propose we replace Thanks in advance. |
@ggouaillardet I see you added this check in 8c601e1; I'm not enough of a Fortran expert to know why it should be |
I am not a Fortran expert, I know less fortran than @jsquyres , I have not read the Fortran standard, and if I did, I am not even sure I would understand it. That being said, the mentioned commit addresses the legit issue reported in #12506. The test does fail indeed with LLVM17, and this is what I want otherwise |
As reported by Chris Parrott in open-mpi#12681, the enhanced test was buggy. Pass the expected argument to make it Fortran compliant and LLVM 17 and above happy compilers Signed-off-by: Gilles Gouaillardet <[email protected]>
Never mind, it seems this PR indeed broke LLVM < 19. |
And I have now to take it back ... |
NVIDIA HPC Compiler (e.g. nvfortran) incorrectly selected the GCC style pragmas to support IGNORE_TKR. Harden the test by mimicking exactly the mpi f08 bindings in order to fix that false positive.
Thanks Chris Parrot for the report.
Refs. #11582