Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

esm_tools output in std out. #1220

Open
JanStreffing opened this issue Sep 17, 2024 · 8 comments · May be fixed by #1225
Open

esm_tools output in std out. #1220

JanStreffing opened this issue Sep 17, 2024 · 8 comments · May be fixed by #1225
Assignees

Comments

@JanStreffing
Copy link
Contributor

For some time now I find, that esm_tools output is written into stdout at model startup. To me this makes debugging model crashes harder, as unrelated monitoring info is interspersed. I am under the impression, that this output used to go somewhere else.

Here is an example:

  0:       file   : namelist.dyn open ok
  0:       file   : namelist.tra open ok
  0:  number of tracers will be changed from          100  to            2 !
  0:  total number of tracers is:            2
  0:  ____________________________________________________________
  0:   --> initialise ALE arrays + sparse SSH stiff matrix
  0:
  0:  ____________________________________________________________
  0:   --> initialise ssh operator using unperturbed ocean depth
  0:       > in stiff_mat_ale, reading /work/ab0246/a270092/input/fesom2//core2/dist_128/rpart.out
Started from: /work/ab0246/a270092/runtime/awicm3-v3.3/no_crash_16/scripts/
starting (jobtype): observe_compute
{'check': False, 'profile': None, 'update': False, 'update_filetypes': [], 'expid': 'no_crash_16', 'launcher_pid': '931519', 'current_date': '20000101', 'run_number': '1', 'jobtype': 'observe_compute', 'last_jobtype': 'prepcompute', 'verbose': True, 'inspect': None, 'use_venv': False, 'no_motd': False, 'ignore_config_warnings': False, 'original_command': 'awicm3-v3.3-levante-TCO95L91-CORE2.yaml -e no_crash_16 -t observe_compute -p 931519 -s 20000101 -r 1 -v --last-jobtype prepcompute --open-run', 'started_from': '/work/ab0246/a270092/runtime/awicm3-v3.3/no_crash_16/scripts/', 'scriptname': 'awicm3-v3.3-levante-TCO95L91-CORE2.yaml', 'runscript_abspath': '/work/ab0246/a270092/runtime/awicm3-v3.3/no_crash_16/scripts/awicm3-v3.3-levante-TCO95L91-CORE2.yaml'}
  0:       took:    6.2958537000000092E-002  sec
  0:
  0:  Arrays are set
  0:  FCT is initialized
  0:            2  tracers will be used in FESOM
  0:  tracer IDs are:            1           2
  0:  read Temperature climatology from:phc3.0_winter.nc
  0:  read Salinity    climatology from:phc3.0_winter.nc
@mandresm
Copy link
Contributor

It is indeed weird, some time ago I coded a stream for observe that will get the output from observe compute, and only write it when the simulation is finished. Somehow, that's broken now?

Thanks for reporting.

@JanStreffing
Copy link
Contributor Author

This has been like this for quite a while for me. Maybe half a year or more. Does that fit timeline wise?

@mandresm
Copy link
Contributor

mandresm commented Sep 17, 2024

I don't know, other than the workflow manager, I don't know of another thing it could have changed that. Do you see that for all versions of FESOM right?

@JanStreffing
Copy link
Contributor Author

Not just FESOM, here is an example from a standlone oifs 48r1 simulation:

scale_max5_oifs_compute_20010101-20010131_12752740.log

@pgierz
Copy link
Member

pgierz commented Sep 27, 2024

I can confirm this, and I think I also know where it is happening. We migrated logging to the loguru library a while ago, if I am not mistaken, and all logs are now dumped into sys.stdout:

https://github.com/esm-tools/esm_tools/blob/16d133daaeefee5151ae02df5b490ce3471d7cbd/src/esm_runscripts/cli.py#L283C10-L283C65

We need to reconsider this, in particular I would in general like to overhaul the logging to make it cleaner between all the various tools. I am not sure when @mandresm or I will have time for that, though.

@mandresm
Copy link
Contributor

Good point @pgierz! The good news is that it's probably really easy to solve :)

Before using loguru we were using print commands and the standard logger, and we had a problem: the observe step runs at the same time the models are running, and printing/logging at the same time of the models. This was a mess. I solved the problem a while ago by creating https://github.com/esm-tools/esm_tools/blob/release/src/esm_runscripts/logfiles.py

That took care of writing all the stdout/err of observe into: 1) a stream 2) on the fly into the observe log, and 3) dumping the stream into the stdout of the script when the simulation was finished.

My feeling is that we can remove https://github.com/esm-tools/esm_tools/blob/release/src/esm_runscripts/logfiles.py, but we will need to configure loguru so that for observe (https://github.com/esm-tools/esm_tools/blob/release/src/esm_runscripts/observe.py) it can write both to the observe log and into a stream that is then put into the model stdout.

@pgierz, can you take this issue under your wing? If that's the case please have a look at https://github.com/esm-tools/esm_tools/blob/release/src/esm_runscripts/logfiles.py, maybe you get some ideas from there into how to implement a similar behavior with loguru. Then we can get rid of the logfiles.py.

@mandresm
Copy link
Contributor

mandresm commented Sep 27, 2024

the entry point for all of this is in the SimulationSetup.__call__:

https://github.com/esm-tools/esm_tools/blob/release/src/esm_runscripts/sim_objects.py#L124

There one calls initialize_logfiles function which uses the RuntimeLogger, the one that takes care of handling the stream:

https://github.com/esm-tools/esm_tools/blob/16d133daaeefee5151ae02df5b490ce3471d7cbd/src/esm_runscripts/logfiles.py#L34C19-L34C32

The RuntimeLogger object is used again to dump it's stream into the general log file:

https://github.com/esm-tools/esm_tools/blob/release/src/esm_runscripts/sim_objects.py#L156

@pgierz pgierz self-assigned this Sep 27, 2024
@pgierz
Copy link
Member

pgierz commented Sep 27, 2024

I'll look into this...

@pgierz pgierz linked a pull request Sep 27, 2024 that will close this issue
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants