Add single-variable time series file generation function from ADF #78

nusbaume · 2024-03-06T22:44:54Z

This should hopefully generalize the timeseries generation capability so that any CUPiD script or notebook can use it.

cupid/timeseries.py

wwieder · 2024-03-06T23:12:03Z

cupid/timeseries.py

+     - case_names: list, str
+         name of simulaton case
+     - hist_str: str
+         CESM history number, ie h0, h1, etc.


Is this generic enough for different history file types (e.g. patch or landunit level output from CLM, as opposed to grid cell averages)

@wwieder Are you referring to the hist_str here? This is now an editable quantity in the config.yml file. Would that be sufficient for those other kinds of history file output? I'm not familiar with those kinds of output, so any clarification would be helpful.

Also mentioned in #86 ...

cupid/timeseries.py

TeaganKing · 2024-03-12T21:21:55Z

We should implement this by adding a call to run.py under if time_series.

Update branch to be up to date with CUPiD/main

dabail10 · 2024-03-13T16:08:16Z

Is this ready for testing?

TeaganKing · 2024-03-13T21:08:23Z

Is this ready for testing?

Hi @dabail10 , sorry for the slow response, I'm in full-day training every day this week... I'm hoping to, at the least, wrap this up before our meeting next week, and will remove the draft label once it's ready for testing/review.

TeaganKing · 2024-03-18T21:55:44Z

I've added the general infrastructure for components to use this timeseries generation. Each component will need to adjust the variables they want to generate (unless processing all vars) and the relevant history string (h0 is the current default) in config.yml. Additionally, in run.py, each component will need to update the height dimension they are using ('lev' is currently the default); this is in run.py instead of the config file since I don't anticipate it changing/needing to be modified by users.

TeaganKing · 2024-03-20T16:58:44Z

Thanks for those suggestions, @mnlevy1981 ! All of those comments have been addressed.

mnlevy1981

I'll run this branch before the 3:00 CUPiD meeting, but these are two more comments from just looking through the code

examples/coupled_model/config.yml

cupid/timeseries.py

cupid/run.py

mnlevy1981

Could we update README.md to mention the ability to generate time series files? Relatedly -- when I ran this myself, I didn't have the nco module loaded so I got an error

FileNotFoundError: [Errno 2] No such file or directory: 'ncrcat'

Maybe a quick note in the README that time series generation requires NCO? Or can we install NCO via conda? if so, maybe we should just add it to conda-dev?

kafitzgerald · 2024-03-20T20:33:14Z

You can certainly install it via conda, but I'm not sure if there are advantages to using the module(s) available on HPC. Potentially some performance / compatibility tradeoffs there.

TeaganKing · 2024-03-20T20:45:42Z

Ok, I added another note to #86 to investigate any potential tradeoffs with NCO, as well as made a note in the README.

And thanks for the loop suggestion Mike-- definitely a good idea!

TeaganKing · 2024-03-20T20:48:02Z

I also updated the project vision as I was looking at the README anyways...

rmshkv · 2024-03-20T22:22:17Z

To elaborate on something I mentioned in #88 - a potential suggestion (that we don't have to take, definitely backseat driving a bit here): the CUPiD/ploomber framework is already set up to support multiple phases of tasks that depend on each other, e.g. running timeseries generation first before diagnostics. I would encourage at some point moving the timeseries config block to be under compute_scripts in config.yml, which is already set up to be organized by components that can be turned on or off, and parameters specific to each component that can be passed in to a universal script. This keeps things more generalizable and less hard-coded. We could then use options in ploomber's tasks to specify that timeseries should be run first. Before that happens we need to document the script functionality (which I'll do soon :) ) and maybe build it out a little more, but after that I'd be happy to make a draft of that modification and see if it's something we want to go with.

TeaganKing · 2024-03-21T16:39:39Z

To elaborate on something I mentioned in #88 - a potential suggestion (that we don't have to take, definitely backseat driving a bit here): the CUPiD/ploomber framework is already set up to support multiple phases of tasks that depend on each other, e.g. running timeseries generation first before diagnostics. I would encourage at some point moving the timeseries config block to be under compute_scripts in config.yml, which is already set up to be organized by components that can be turned on or off, and parameters specific to each component that can be passed in to a universal script. This keeps things more generalizable and less hard-coded. We could then use options in ploomber's tasks to specify that timeseries should be run first. Before that happens we need to document the script functionality (which I'll do soon :) ) and maybe build it out a little more, but after that I'd be happy to make a draft of that modification and see if it's something we want to go with.

@rmshkv thanks for this suggestion! Just to clarify, are you suggesting creating a new compute_scripts block? I don't see a pre-existing one (although I do see compute_notebooks). Also, maybe it would be best to do this after #88 comes in? I think it would be a fairly simple change to move these and adjust how things are called from the config file in run.py, but I'm a bit hesitant to do that before #88 comes in.

rmshkv · 2024-03-21T17:38:07Z

To elaborate on something I mentioned in #88 - a potential suggestion (that we don't have to take, definitely backseat driving a bit here): the CUPiD/ploomber framework is already set up to support multiple phases of tasks that depend on each other, e.g. running timeseries generation first before diagnostics. I would encourage at some point moving the timeseries config block to be under compute_scripts in config.yml, which is already set up to be organized by components that can be turned on or off, and parameters specific to each component that can be passed in to a universal script. This keeps things more generalizable and less hard-coded. We could then use options in ploomber's tasks to specify that timeseries should be run first. Before that happens we need to document the script functionality (which I'll do soon :) ) and maybe build it out a little more, but after that I'd be happy to make a draft of that modification and see if it's something we want to go with.

@rmshkv thanks for this suggestion! Just to clarify, are you suggesting creating a new compute_scripts block? I don't see a pre-existing one (although I do see compute_notebooks). Also, maybe it would be best to do this after #88 comes in? I think it would be a fairly simple change to move these and adjust how things are called from the config file in run.py, but I'm a bit hesitant to do that before #88 comes in.

The functionality to handle a compute_scripts block provided by config.yml already fully exists - there just isn't a compute_scripts block in the existing example in this repo. That's why I need to document it, it's kind of a secret feature right now :) And yeah agreed that this and #88 should be merged in first - I just wanted to get my thoughts written down somewhere so we can discuss.

TeaganKing · 2024-03-21T17:45:47Z

@rmshkv Got it, that makes sense now! Thank you for clarifying! It seems like your suggestion is probably the best way forward, but I summarized this and some other relevant discussions that Mike and I had in #86 (with a note that we should probably use compute_scripts) just so the information is all in one place.

mnlevy1981

This looks great! One more minor-ish suggestion, broken into two inline comments (and if we want to push it to #86 that's okay).

examples/coupled_model/config.yml

cupid/run.py

examples/coupled_model/config.yml

mnlevy1981

Looks good, great work!

TeaganKing · 2024-03-22T18:49:50Z

Thanks for all your help and suggestions from everyone who commented!

Add initial ADF timeseries generation function.

a2ec51a

TeaganKing reviewed Mar 6, 2024

View reviewed changes