-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rationale for new CMOR tables #13
Comments
@taylor13, a very quick reply. The goal of new tables is to weed out the logical inconsistencies that exist in the existing CMIP6 tables (E*), in addition to folding all the existing used tables (input4MIPs, obs4MIPs, ...) into a single, centralized entry to enable use across projects. This then means that across projects the evolution of the quantities and their organization can be centrally managed, and ensure that a tas across projects is the same quantity, with the same associated information, but customized for the project (ala project CVs) |
I understood generally that purpose, but I think we need to explore the consequences by answering the specific questions posed. |
First up, I'm answering mostly in the context of CMIP6Plus here rather than CMIP7. File naming and DRS structure should be a project specific decision (consider CORDEX with different naming requirements and directory structures). CMIP6Plus is going to have to use CMOR 3.7 compatible tables in order for us to be able to do anything in the near future so we can develop the underlying tables separately providing that tools to export "legacy" table sets from mip tables plus CVs.
We are very used to tables and have infrastructure built up to work with them. The main purpose of the new tables is to re-arrange variables such that we can logically add new variables to serve the needs of other projects.
Yes. When coming up with a variable list this is the simplest way to specify a variable.
Yes x 3 provided this is useful to the user, but again it depends what the project involved wants.
Yes. For me this isn't a dramatically difficult task, but the same may not be true of other groups. We refer to lookup tables anyway when constructing variable lists
To be honest, I don't see users of the data struggling with this much. I would expect them to first search for the variable name and see what variables are defined (i.e. look for appropriate spatial shape / frequency). When it comes to assigning a realm maintaining consistency with other similar variables where possible are would be my temptation. The realm is a search facet on esgf, but I don't think this is one that I have used.
Yes, this could mean lots of tables. While I would prefer a smaller set, I don't necessarily see a problem with having a lot of tables as long as the logic is reasonably clear. I don't dig into the JSON tables themselves other than when debugging -- I tend to work with large searchable tables such as this when I want to investigate particular records.
This is something that needs looking at, but which can be done in slower time than initiating a CMIP6Plus mip era. In the first instance we could use a unit test framework to confirm consistency -- this is relatively simple to set up. Alternatively we could migrate data that is common to a separate document, which would look a lot like the MIPVariable entries in the CMIP6 DR, but again until we start on a CMOR4 version that works of different table structures I don't see that this needs finalising.
We could go back to CMIP6 tables, but then we are constrained in how we add new variables and what do we do with other projects such as obs4mips/input4mips. |
This is good information. Thanks, Matt, for your usual careful thinking on this. Given what you’ve said, I propose the following. (Sorry that the indentation isn't preserved from my original document.) My understanding is that the requirements are:
My understanding is that it is not important that the file names remain consistent with CMIP6. I don’t think we need to harmonize tables across projects, just variables.
If my understanding is correct, I suggest we proceed in two steps as follow: Near term:
Longer term: The longer-term changes will depend somewhat on which of the near-term options are adopted. If the unique branded variable labels are adopted, then future CMIP infrastructure can be built without consideration of how variables are grouped into tables. This will free constraints on CMOR tables while ensuring consistency across tables (because all variables will be drawn from the same master list). A MIP might group variables however they please (drawing exclusively from the master MIP table of variables). Modeling groups may also, if they wish, group the variables differently from the MIP groupings. A simple code could be constructed that, given a list of branded variable labels, would simply produce a CMOR-readable table with all the appropriate attributes. It will be easy to implement the master list of branded variables because I have already done this, but not yet in the correct JSON file dictionary format. I estimate that someone familiar with python could easily take by excel spread sheet, which has all the needed attributes defined and construct the master list of MIP variables (which would only include on first pass the CMIP6 variables). |
Thanks @taylor13 as per usual, your deep thinking on this is a pause for thought. Regarding the per project search facets, this is what we have per project in the old COG configuration.
Moving forward the metagrid interface provides overview facets, which for the CMIP6 configuration lump identifiers together (e.g. Labels includes variant_label and grid_label):
I am not familiar with how flexible this interface is. It would be useful to add the configuration information to the discussion |
Dropping some links down here, as they are relevant for the discussion |
I believe this discussion and that in #26 would be useful to merge |
@durack1 @matthew-mizielinski - Before Dan gets into trying to clean up and implement a new set of CMOR tables, I think it must be made clear what the rationale for and consequences of doing this are. What was wrong with the CMIP6 tables and why will the new tables be better? The following questions come to mind, which I think should be addressed before proceeding:
a. Would table names continue to be used to uniquely label variables (root name + table name)?
b. Would table names be used as search facets? In file names? In directory structures?
The text was updated successfully, but these errors were encountered: