Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add single-variable time series file generation function from ADF #78

Merged
merged 30 commits into from
Mar 22, 2024

Conversation

nusbaume
Copy link
Collaborator

@nusbaume nusbaume commented Mar 6, 2024

This should hopefully generalize the timeseries generation capability so that any CUPiD script or notebook can use it.

cupid/timeseries.py Outdated Show resolved Hide resolved
cupid/timeseries.py Outdated Show resolved Hide resolved
- case_names: list, str
name of simulaton case
- hist_str: str
CESM history number, ie h0, h1, etc.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this generic enough for different history file types (e.g. patch or landunit level output from CLM, as opposed to grid cell averages)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wwieder Are you referring to the hist_str here? This is now an editable quantity in the config.yml file. Would that be sufficient for those other kinds of history file output? I'm not familiar with those kinds of output, so any clarification would be helpful.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also mentioned in #86 ...

cupid/timeseries.py Outdated Show resolved Hide resolved
cupid/timeseries.py Outdated Show resolved Hide resolved
cupid/timeseries.py Outdated Show resolved Hide resolved
cupid/timeseries.py Outdated Show resolved Hide resolved
@TeaganKing
Copy link
Collaborator

We should implement this by adding a call to run.py under if time_series.

@dabail10
Copy link
Collaborator

Is this ready for testing?

@TeaganKing
Copy link
Collaborator

Is this ready for testing?

Hi @dabail10 , sorry for the slow response, I'm in full-day training every day this week... I'm hoping to, at the least, wrap this up before our meeting next week, and will remove the draft label once it's ready for testing/review.

@TeaganKing TeaganKing self-assigned this Mar 18, 2024
@TeaganKing
Copy link
Collaborator

I've added the general infrastructure for components to use this timeseries generation. Each component will need to adjust the variables they want to generate (unless processing all vars) and the relevant history string (h0 is the current default) in config.yml. Additionally, in run.py, each component will need to update the height dimension they are using ('lev' is currently the default); this is in run.py instead of the config file since I don't anticipate it changing/needing to be modified by users.

@TeaganKing
Copy link
Collaborator

Thanks for those suggestions, @mnlevy1981 ! All of those comments have been addressed.

Copy link
Collaborator

@mnlevy1981 mnlevy1981 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll run this branch before the 3:00 CUPiD meeting, but these are two more comments from just looking through the code

examples/coupled_model/config.yml Show resolved Hide resolved
cupid/timeseries.py Outdated Show resolved Hide resolved
cupid/run.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@mnlevy1981 mnlevy1981 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we update README.md to mention the ability to generate time series files? Relatedly -- when I ran this myself, I didn't have the nco module loaded so I got an error

FileNotFoundError: [Errno 2] No such file or directory: 'ncrcat'

Maybe a quick note in the README that time series generation requires NCO? Or can we install NCO via conda? if so, maybe we should just add it to conda-dev?

@kafitzgerald
Copy link
Collaborator

You can certainly install it via conda, but I'm not sure if there are advantages to using the module(s) available on HPC. Potentially some performance / compatibility tradeoffs there.

@TeaganKing
Copy link
Collaborator

Ok, I added another note to #86 to investigate any potential tradeoffs with NCO, as well as made a note in the README.

And thanks for the loop suggestion Mike-- definitely a good idea!

@TeaganKing
Copy link
Collaborator

I also updated the project vision as I was looking at the README anyways...

@rmshkv
Copy link
Contributor

rmshkv commented Mar 20, 2024

To elaborate on something I mentioned in #88 - a potential suggestion (that we don't have to take, definitely backseat driving a bit here): the CUPiD/ploomber framework is already set up to support multiple phases of tasks that depend on each other, e.g. running timeseries generation first before diagnostics. I would encourage at some point moving the timeseries config block to be under compute_scripts in config.yml, which is already set up to be organized by components that can be turned on or off, and parameters specific to each component that can be passed in to a universal script. This keeps things more generalizable and less hard-coded. We could then use options in ploomber's tasks to specify that timeseries should be run first. Before that happens we need to document the script functionality (which I'll do soon :) ) and maybe build it out a little more, but after that I'd be happy to make a draft of that modification and see if it's something we want to go with.

@TeaganKing
Copy link
Collaborator

TeaganKing commented Mar 21, 2024

To elaborate on something I mentioned in #88 - a potential suggestion (that we don't have to take, definitely backseat driving a bit here): the CUPiD/ploomber framework is already set up to support multiple phases of tasks that depend on each other, e.g. running timeseries generation first before diagnostics. I would encourage at some point moving the timeseries config block to be under compute_scripts in config.yml, which is already set up to be organized by components that can be turned on or off, and parameters specific to each component that can be passed in to a universal script. This keeps things more generalizable and less hard-coded. We could then use options in ploomber's tasks to specify that timeseries should be run first. Before that happens we need to document the script functionality (which I'll do soon :) ) and maybe build it out a little more, but after that I'd be happy to make a draft of that modification and see if it's something we want to go with.

@rmshkv thanks for this suggestion! Just to clarify, are you suggesting creating a new compute_scripts block? I don't see a pre-existing one (although I do see compute_notebooks). Also, maybe it would be best to do this after #88 comes in? I think it would be a fairly simple change to move these and adjust how things are called from the config file in run.py, but I'm a bit hesitant to do that before #88 comes in.

@rmshkv
Copy link
Contributor

rmshkv commented Mar 21, 2024

To elaborate on something I mentioned in #88 - a potential suggestion (that we don't have to take, definitely backseat driving a bit here): the CUPiD/ploomber framework is already set up to support multiple phases of tasks that depend on each other, e.g. running timeseries generation first before diagnostics. I would encourage at some point moving the timeseries config block to be under compute_scripts in config.yml, which is already set up to be organized by components that can be turned on or off, and parameters specific to each component that can be passed in to a universal script. This keeps things more generalizable and less hard-coded. We could then use options in ploomber's tasks to specify that timeseries should be run first. Before that happens we need to document the script functionality (which I'll do soon :) ) and maybe build it out a little more, but after that I'd be happy to make a draft of that modification and see if it's something we want to go with.

@rmshkv thanks for this suggestion! Just to clarify, are you suggesting creating a new compute_scripts block? I don't see a pre-existing one (although I do see compute_notebooks). Also, maybe it would be best to do this after #88 comes in? I think it would be a fairly simple change to move these and adjust how things are called from the config file in run.py, but I'm a bit hesitant to do that before #88 comes in.

The functionality to handle a compute_scripts block provided by config.yml already fully exists - there just isn't a compute_scripts block in the existing example in this repo. That's why I need to document it, it's kind of a secret feature right now :) And yeah agreed that this and #88 should be merged in first - I just wanted to get my thoughts written down somewhere so we can discuss.

@TeaganKing
Copy link
Collaborator

@rmshkv Got it, that makes sense now! Thank you for clarifying! It seems like your suggestion is probably the best way forward, but I summarized this and some other relevant discussions that Mike and I had in #86 (with a note that we should probably use compute_scripts) just so the information is all in one place.

@TeaganKing TeaganKing requested a review from mnlevy1981 March 21, 2024 17:47
Copy link
Collaborator

@mnlevy1981 mnlevy1981 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! One more minor-ish suggestion, broken into two inline comments (and if we want to push it to #86 that's okay).

examples/coupled_model/config.yml Outdated Show resolved Hide resolved
cupid/run.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@mnlevy1981 mnlevy1981 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, great work!

@TeaganKing
Copy link
Collaborator

Thanks for all your help and suggestions from everyone who commented!

@TeaganKing TeaganKing merged commit dec3cef into NCAR:main Mar 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Abstract out ADF's time series generation functionality so other components can use it
7 participants