Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the block-based SDA workflow. #3197

Merged
merged 62 commits into from
Nov 24, 2023
Merged

Conversation

DongchenZ
Copy link
Contributor

@DongchenZ DongchenZ commented Jul 13, 2023

Description

This PR added the highly modulized block-based SDA workflow, which has complete try-catch detections that will facilitate the user-end debugging process. This workflow is also highly parallelized, which will increase the speed a lot.

Motivation and Context

The multisite SDA workflow has been problematic due to the high-dimension Bayesian sampling and speed. By using the block-based SDA workflow, people can maintain the multi-site structure and, meanwhile, the covariance structure. For the Wishart case of the MCMC sampling, it's been problematic due to the changes in observations, which is now been solved by employing the global aqq that is defined by the number of state variables.
Beyond that, this PR has the following updates:

  1. Allow pre-existed aqq & bqq imported from outside.
  2. Allow pre-specified prior of Q imported from outside.
  3. Enabled the MCMC sampling with NA observations to help estimate the process variance for a free run.
  4. The qsub_parallel function is updated and more robust.
  5. Fixed the bug of the edge cases of the single site and single observation scenario.
  6. Fixed the bug where there is no nc_close and on.exit() paired with each nc_open function for the prepare_pools function.

Review Time Estimate

  • Immediately
  • Within one week
  • When possible

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My change requires a change to the documentation.
  • My name is in the list of CITATION.cff
  • I have updated the CHANGELOG.md.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@DongchenZ DongchenZ requested a review from mdietze July 13, 2023 18:43
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we add "aqq.Init" and "bqq.Init" under the tag "<state.data.assimilation>"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. You can now add those two arguments inside this script.

Profiling = FALSE,
OutlierDetection=FALSE,
parallel_qsub = TRUE,
free_run = FALSE,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we probably need to remove "free_run" here? And "SDA_runner.R" should be updated too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

base/remote/R/qsub_parallel.R Show resolved Hide resolved

* pre_enkf_params - (optional) Used for carrying out SDA with pre-existed enkf.params, in which the Pf, aqq, and bqq can be used for the analysis step. Defualt is NULL.

* ensemble.samples - (optional) Pass ensemble.samples from outside to avoid GitHub check issues. Defualt is NULL.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. check spelling on "default" throughout.
  2. explain what ensemble.samples can be used for productively! This argument doesn't just exist to avoid a github check. Indeed, there's a legitimate question about whether pre_enkf_params and ensemble.samples should just be part of the restart list rather than their own arguments (e.g. are there scenarios where you'd use them outside of a restart run?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Fixed.
  2. Fixed.

#### **sda.enkf.multisite.R Arguments**
* settings - (required) [State Data Assimilation Tags Example] settings object

* obs.mean - (required) List of sites named by site ids, which contains dataframe for observation means, named with observation datetime.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if it's documented somewhere else, but this explanation of how obs.mean and obs.cov have to be structured is clearly insufficient. Also, lets make sure the documentation here matches the ROxygen documentation of the function itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added documentation about why we need observations, how to generate them, and what are the basic formats for them.


* control - (optional) List of flags controlling the behaviour of the SDA. Default is as follows:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the list below isn't documented here, point the reader to where they can find an explanation of what all the flags below do (e.g. make sure they're documented in the function's Roxygen

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -324,34 +363,124 @@ $`2010/12/31`$`1000000651`
[1,] 15.2821691 0.513584319
[2,] 0.1213583 0.001162113
```
An example of multi-settings pecan xml file also may look like below:
#### Anlysis SDA workflow
Before running the SDA analysis functions, the ensemble forecast results have to be generated, and arguments such as H matrix, MCMC arguments, and multi-site Y and R (by `Construct.R` function) have to be generated as well. If you want to proceed the block-based SDA workflow for a free run (estimate the process error to the free run), you need to first specify the `state.data.assimilation` as follows. Beyond that, if any of your sites has zero observation, you should flag the `by.site` as TRUE.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sentence "If you want to proceed the block-based SDA workflow for a free run (estimate the process error to the free run)" is confusing since you haven't said what the block-block based SDA is, what a free run is, or what each of the tags below actually does. e.g. is by.site what turns on the block-based version? And if so, why isn't it called by.block and why would you want it turned on if sites have zero observations? What does turning on the free.run tag cause to happen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

##' @param pft.path physical path to the pft.csv file.
##' @param by criteria, it supports by variable, site, pft, all, and single Q.
##'
##' @description This function is an upgrade to the Construct.H.multisite function which provides the index by different criteria.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's an upgrade, why not just fix the previous function? The whole point of a function is that if the inputs and outputs are defined then anything internal should be malleable with affecting the rest of the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't cleaned up this function. Only partial features work now (but this is enough for the current SDA workflow). I will make another PR to build the H matrix under different situations.

#I think the blocked nimble has to be implemented and used instead of a long vector sampling.
#1) due to the convergence of X.mod.
#2) temporal efficiency. MVN sampling over a large cov matrix can be time consuming.
#4) this data structure design allows us to implement the MCMC sampling parallely.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this model actually different or is it just called by block instead of overall? I'm not sure I 100% understand why we need a new nimble model.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -19,6 +19,21 @@ SDA_OBS_Assembler <- function(settings){
#extract Obs_Prep object from settings.
Obs_Prep <- settings$state.data.assimilation$Obs_Prep

#check if we want to proceed the free run without any observations.
if (as.logical(settings$state.data.assimilation$free.run)) {
#calcualte time points.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calculate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -362,7 +355,7 @@ sda.enkf.multisite <- function(settings,
)
})
###------------------------------------------------------------------------------------------------###
### loop over time ###
### w over time ###
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I'm following the change in comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -20,6 +20,8 @@ pool_ic_netcdf2list <- function(nc.path){
for(varname in names(vals)){
vals[[varname]] <- ncdf4::ncvar_get(IC.nc,varname)
}
ncdf4::nc_close(IC.nc)
on.exit()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two lines should be combined into one line and moved up to right after the nc_open. That ensures that the nc file is closed when the function exits, even if the function exits with an error. See for example

on.exit(ncdf4::nc_close(nc), add = FALSE)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@mdietze mdietze added this pull request to the merge queue Nov 24, 2023
Merged via the queue into PecanProject:develop with commit 824bcfa Nov 24, 2023
10 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants