Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Makefile for HILE using cray toolchain, building for CPU. #1043

Open
wants to merge 19 commits into
base: dev
Choose a base branch
from

Conversation

ursg
Copy link
Contributor

@ursg ursg commented Oct 21, 2024

Also includes building of boost program_options as part of build_libraries.sh

Not sure if that is the way we want to encourage people to build it, but I guess it is at least useful to have the commands for it in the libraries script.

ursg added 19 commits October 21, 2024 09:05
Also includes building of boost program_options as part of build_libraries.sh

Not sure if that is the way we *want* to encourage people to build it, but I
guess it is at least useful to have the commands for it in the libraries
script.
This commit is step 1 of getting testpackage runs on HILE. Mostly, this
restructures the Yaml file to more clearly separate
stuff-that-happens-on-carrington and stuff-that-happens-on-hile.

I kinda expect this to not quite work out of the box, but let's see!
No need to spam >80k lines of output there.
Also, make boost download less verbose.
Maybe that even makes things faster!
Otherwise, boost isn't found in ld_library_path
Also, revert 134a082 to remove the ldd output
again.
@ursg
Copy link
Contributor Author

ursg commented Nov 4, 2024

In testpackage evaluation, the vlsvdiffs of some tests (but not all?) segfault:

----------
running Flowthrough_x_inflow_y_outflow 
----------
--------------------------------
 ref-time | new-time | speedup |
--------------------------------
 3.385      3.956      0.855662
-------------------------------------------
 variable                                     | absolute diff | relative diff |
-------------------------------------------
Comparing file /wrk-vakka/group/spacephysics/proj/CI/hile-uganse/_work/vlasiator/vlasiator/testpackage/run_2024.11.04_09.38.38/Flowthrough_x_inflow_y_outflow/bulk.0000001.vlsv against reference
 proton/vg_rho_0                                3.49e-10        3.49e-16
 proton/vg_v_0                                  3.49e-10        6.48e-16
 proton/vg_v_1                                  1.16e-10        3.84e-16
 proton/vg_v_2                                  1.04e-11        0.94
 fg_b_0                                         0               0
 fg_b_1                                         0               0
 fg_b_2                                         0               0
 fg_e_0                                         5.42e-20        3.37e-16
 fg_e_1                                         0               0
 fg_e_2                                         8.13e-20        3.49e-16
VLSV file timestamps match.
----------
Comparing file /wrk-vakka/group/spacephysics/proj/CI/hile-uganse/_work/vlasiator/vlasiator/testpackage/run_2024.11.04_09.38.38/Flowthrough_x_inflow_y_outflow/bulk.0000002.vlsv against reference
srun: error: x3000c0s14b1n0: task 0: Segmentation fault (core dumped)
 proton/vg_rho_0                                                
srun: error: x3000c0s14b1n0: task 0: Segmentation fault (core dumped)
 proton/vg_v_0                                                  
srun: error: x3000c0s14b1n0: task 0: Segmentation fault (core dumped)
 proton/vg_v_1                                                  
srun: error: x3000c0s14b1n0: task 0: Segmentation fault (core dumped)
 proton/vg_v_2                                                  
srun: error: x3000c0s14b1n0: task 0: Segmentation fault (core dumped)
 fg_b_0                                                         
srun: error: x3000c0s14b1n0: task 0: Segmentation fault (core dumped)
 fg_b_1                                                         
srun: error: x3000c0s14b1n0: task 0: Segmentation fault (core dumped)
 fg_b_2                                                         
srun: error: x3000c0s14b1n0: task 0: Segmentation fault (core dumped)
 fg_e_0                                                         
srun: error: x3000c0s14b1n0: task 0: Segmentation fault (core dumped)
 fg_e_1                                                         
srun: error: x3000c0s14b1n0: task 0: Segmentation fault (core dumped)
 fg_e_2                                                         
awk: cmd. line:1: BEGIN{print (!= 0.0)?1:0}
awk: cmd. line:1:              ^ syntax error
awk: cmd. line:1: BEGIN{print (!= 0.0)?1:0}
awk: cmd. line:1:                     ^ syntax error
VLSV file timestamps match.
----------

Is this just a result of carrington-reference-data being compared with HILE results? Shouldn't our testpackage be resilient to that?

@ursg
Copy link
Contributor Author

ursg commented Nov 4, 2024

Also, the testpackage is much slower (by a factor ~4) than on carrington, and eventually runs into a timeout.
This smells like our old friend, the core placement bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant