Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relax our license from CC BY_SA 4.0 to CC BY 4.0 #7

Closed
chengzhuzhang opened this issue Jun 17, 2022 · 30 comments
Closed

Relax our license from CC BY_SA 4.0 to CC BY 4.0 #7

chengzhuzhang opened this issue Jun 17, 2022 · 30 comments

Comments

@chengzhuzhang
Copy link
Contributor

According to @mccoy20, E3SM will relax our license from CC BY_SA 4.0 to CC BY 4.0

Based on the e2c codes, it looks like our custom metadata definition file would supersede what's being registered to CMIP6_CVs for duplicated attributes. We will make use the new license for metadata starting for v2 data publication.

i.e. to replace: texts in "license" with

Creative Commons Attribution 4.0 International (CC BY 4.0; https://creativecommons.org/licenses/by/4.0/

@chengzhuzhang
Copy link
Contributor Author

I realize that, we can remove the license attribute from metadata definition .json files. license information will be write out by cmor by default.

@chengzhuzhang
Copy link
Contributor Author

Based on the recommendation from @durack1
PCMDI/cmip6-cmor-tables#377 (comment)
We will need to add back the license information to these metadata definition files.

@TonyB9000 , would you please add back "license" in all v2 materials?

"license": "CMIP6 model data produced by E3SM is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0; https://creativecommons.org/licenses/by/4.0/). Consult https://pcmdi.llnl.gov/CMIP6/TermsOfUse for terms of use governing CMIP6 output, including citation requirements and proper acknowledgment. Further information about this data, including some limitations, can be found via the further_info_url (recorded as a global attribute in this file) and at https:///pcmdi.llnl.gov/. The data producers and data providers make no warranty, either express or implied, including, but not limited to, warranties of merchantability and fitness for a particular purpose. All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law.",

tagging Renata @mccoy20 for this update.

@TonyB9000
Copy link
Contributor

Will do.

@durack1
Copy link

durack1 commented Jul 28, 2022

@chengzhuzhang good catch.

One tweak, the institution_id of all the E3SM contributions is E3SM-Project (see here and for E3SM-2-0 here). So following that, I would suggest a tweak from:

"CMIP6 model data produced by E3SM is licensed under a.." ->
"CMIP6 model data produced by E3SM-Project is licensed under a.."

@chengzhuzhang
Copy link
Contributor Author

Thank you for reviewing @durack1! Yes, using E3SM-Project is consistent.

@TonyB9000
Copy link
Contributor

Lol. I'll fix.

@TonyB9000
Copy link
Contributor

Done - again :)

@TonyB9000
Copy link
Contributor

Now, considering we still have many v1 CMIP6 sets to publish (and some to generate), should the v1 metadata files be updated to the new licensing spec?

@durack1
Copy link

durack1 commented Jul 28, 2022

I would (update CC BY-SA 4.0 -> CC BY 4.0). We have caught the license relaxation in the registered info, e.g. here, but if new data is being written, may as well have the latest (correct) license in the file(s)

@TonyB9000
Copy link
Contributor

TonyB9000 commented Jul 28, 2022

The license info in the E3SM-1-0 user metadata files begin:

"CMIP6 model data produced by E3SM is licensed under a Creative Commons Attribution ShareAlike 4.0 ..."

The statement has no "BY or "BY-SA" in these files. How should the replacement text read?

Also, I note that some projects have changed to the newer license, but reiterate the old license in their "history" attribute. Is this necessary?

(I may try to employ Charlie Zender's "ncatted" to modify unpublished, pre-generated CMIP6 datasets.)

@durack1
Copy link

durack1 commented Jul 28, 2022

@TonyB9000 the template below (I have tweaked E3SM -> E3SM-Project) is what I would suggest for any NEW E3SM-1-x (or E3SM-2-x) data to be written:

"license": "CMIP6 model data produced by E3SM-Project is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0; https://creativecommons.org/licenses/by/4.0/). Consult https://pcmdi.llnl.gov/CMIP6/TermsOfUse for terms of use governing CMIP6 output, including citation requirements and proper acknowledgment. Further information about this data, including some limitations, can be found via the further_info_url (recorded as a global attribute in this file) and at https:///pcmdi.llnl.gov/. The data producers and data providers make no warranty, either express or implied, including, but not limited to, warranties of merchantability and fitness for a particular purpose. All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law.",

For existing, published data these can be left as is. The registered information, and the centralization of this in the CMIP6_source_id_licenses.html page which serves the entire CMIP6 project captures the latest registered information, removing the need for data files to be rewritten (and for unpublishing and republishing data across the ESGF CMIP6 federation).

@TonyB9000
Copy link
Contributor

By "via the further_info_url (recorded as a global attribute in this file)", I assume "this file" is the above TermsOfUse page.

I will modify ALL of our existing user metadata templates to use this text, and modify our not-yet-published datasets to reflect the same.

Note a small typo: "and at https:///pcmdi.llnl.gov/" ("///").

@durack1
Copy link

durack1 commented Jul 28, 2022

@TonyB9000 the actual template found here follows:

"license":"CMIP6 model data produced by <Your Institution; see CMIP6_institution_id.json> is licensed under a <Creative Commons; select and insert a license_id; see below> License (<insert the matching license_url; see below>). Consult https://pcmdi.llnl.gov/CMIP6/TermsOfUse for terms of use governing CMIP6 output, including citation requirements and proper acknowledgment. Further information about this data, including some limitations, can be found via the further_info_url (recorded as a global attribute in this file)[ and at <some URL maintained by modeling group>]. The data producers and data providers make no warranty, either express or implied, including, but not limited to, warranties of merchantability and fitness for a particular purpose. All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law.",

So I assume that for the E3SM-Project files, the <some URL maintained by modeling group> would most correctly be https://e3sm.org/model/ or similar, no?

@TonyB9000
Copy link
Contributor

I have no idea. Others more savvy than I will need to chime in on that. @chengzhuzhang ? @mccoy20 ?

Thanks for the heads-up. I'd like to get this correct from the start.

@durack1
Copy link

durack1 commented Jul 28, 2022

By "via the further_info_url (recorded as a global attribute in this file)", I assume "this file" is the above TermsOfUse page.

Sorry @TonyB9000, this takes the form (it's a netcdf file global attribute):

(base) -bash-4.2$ ncdump -h /p/../CMIP6/CMIP/E3SM-Project/E3SM-1-1/historical/r1i1p1f1/Amon/tas/gr/v20191211/tas_Amon_E3SM-1-1_historical_r1i1p1f1_gr_185001-185912.nc | grep further
:further_info_url = "https://furtherinfo.es-doc.org/CMIP6.E3SM-Project.E3SM-1-1.historical.none.r1i1p1f1" ;
:license = "CMIP6 model data produced by E3SM is licensed under a Creative Commons Attribution ShareAlike 4.0 International License (https://creativecommons.org/licenses). Consult https://pcmdi.llnl.gov/CMIP6/TermsOfUse for terms of use governing CMIP6 output, including citation requirements and proper acknowledgment. Further information about this data, including some limitations, can be found via the further_info_url (recorded as a global attribute in this file) and at https:///pcmdi.llnl.gov/. The data producers and data providers make no warranty, either express or implied, including, but not limited to, warranties of merchantability and fitness for a particular purpose. All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law." ;

@TonyB9000
Copy link
Contributor

OK, thanks Paul. I was not (originally) prepared that the datafile metadata be an amalgam of metadata from various sources. I am still unclear where all of it comes from. Some is in the native data, and some we add as we generate derivatives (climos, timeseries, and CMIP6 data). It would be nice to have a "map"...

@chengzhuzhang
Copy link
Contributor Author

chengzhuzhang commented Jul 28, 2022

Hey @TonyB9000 I think we should revert the license change for v1 material. It looks like not matching license info with registered v1 info would trigger a validation error in cmor, as below:

^[[1;31;47m!!!!!!!!!!!!!!!!!!!!!!!!!
!
! Error: The attribute "license" could not be validated.
! The current input value is "CMIP6 model data produced by E3SM-Project is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0; https://creativecommons.org/licenses/by/4.0/). Consult https://pcmdi.llnl.gov/CMIP6/TermsOfUse for terms of use governing CMIP6 output, including citation requirements and proper acknowledgment. Further information about this data, including some limitations, can be found via the further_info_url (recorded as a global attribute in this file) and at https://pcmdi.llnl.gov/. The data producers and data providers make no warranty, either express or implied, including, but not limited to, warranties of merchantability and fitness for a particular purpose. All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law." which is not valid
! Valid values must match the regular expression:
!       ["^CMIP6 model data pr
!
!!!!!!!!!!!!!!!!!!!!!!!!!^[[0m

@chengzhuzhang
Copy link
Contributor Author

sorry about the confusion. I'm still learning the expected behavior of cmor..

@TonyB9000
Copy link
Contributor

No Problem. I haven't merged (I think). Can always un-merge, in any case.

But between the CMIP6-Metadata repo, the E2C, DataSM, etc, we need to be clear on dependencies. When I make a change in datasm, I just "pip install" and I run the new stuff. But if it needs updated e2c, or alternate metadata, I'm not sure how to proceed.

We should be able to update the registration for v1, else we are forced to use an inappropriate license, right?

@mauzey1
Copy link

mauzey1 commented Jul 29, 2022

#7 (comment)

I see. This was an issue I found in CMOR 3.6.1 where the message created for the "attribute X could not be validated" error got too long for the maximum string length. It originally tried to create a regex string using all of the values listed for an attribute in the registered data. By having 4 really long strings for license templates, it easily exceeds the maximum length. I fixed it in the current nightly build of CMOR, and it will be in the stable build of CMOR 3.7.0 once released.

The other part of the error could be due to using older licenses with the new license templates in CMIP6_CV.json.

@TonyB9000
Copy link
Contributor

@chengzhuzhang The e3sm_to_cmip codes include "e3sm_to_cmip/resources/default_metadata.json". I don't know quite where it is used - perhaps just a human aid. In any case, if we cannot get the v1 re-registered with the new license, we may want to have both default_v1_metadata.json and default_v2_metadata.json

@chengzhuzhang
Copy link
Contributor Author

I think the default_metadata.json is just a template for testing.

@chengzhuzhang
Copy link
Contributor Author

#7 (comment)

I see. This was an issue I found in CMOR 3.6.1 where the message created for the "attribute X could not be validated" error got too long for the maximum string length. It originally tried to create a regex string using all of the values listed for an attribute in the registered data. By having 4 really long strings for license templates, it easily exceeds the maximum length. I fixed it in the current nightly build of CMOR, and it will be in the stable build of CMOR 3.7.0 once released.

The other part of the error could be due to using older licenses with the new license templates in CMIP6_CV.json.

@mauzey1 Thank you for the clarification, Chris. For the second part, would you elaborate it a bit. For E3SM-1-* should we not change the license attribute in the cmor input file? Otherwise, there will be a mismatch between the license and what has been registered for E3SM v1. We should only have the new license for v2, is this what is expected by cmor?

@durack1
Copy link

durack1 commented Jul 29, 2022

Hi folks, just circling around. This issue is ultimately due to a current limitation in CMOR3.6.1 in which variables have an upper limit of 1023 chars.

The changes implemented to the PCMDI/cmip6-cmor-tables@71d1533 on the 7th June, have led to a problem in which the 1023 char limit is exceeded causing PrePARE and CMOR runtime issues.

Just to link across repos, the related open issues are PCMDI/cmor#660, PCMDI/cmip6-cmor-tables#376.

A resolution has been proposed in PCMDI/cmip6-cmor-tables#376 (comment) (the input table files), which once implemented, should bring stability back. This will allow the E3SM-2-0 data to be written (using the very latest cmip6-cmor-tables, which include the registered E3SM-2-0), along with PrePARE use.

Hopefully @mauzey1 can get that done soon

@durack1
Copy link

durack1 commented Jul 29, 2022

@chengzhuzhang @TonyB9000 changes in PCMDI/cmip6-cmor-tables#380 were just merged (thanks @mauzey1!) so please pull down the latest version and let us know if the issue is resolved

@chengzhuzhang
Copy link
Contributor Author

Thanks a lot for all the discussion and development. I checked out the latest cmor-tables, and tried to write E3SM v2 data. The "buffer overflow detected" issue is gone. But still got the error below:

^[[1;31;47m!!!!!!!!!!!!!!!!!!!!!!!!!
!
! Error: The attribute "license" could not be validated.
! The current input value is "CMIP6 model data produced by E3SM-Project is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0; https://creativecommons.org/licenses/by/4.0/). Consult https://pcmdi.llnl.gov/CMIP6/TermsOfUse for terms of use governing CMIP6 output, including citation requirements and proper acknowledgment. Further information about this data, including some limitations, can be found via the further_info_url (recorded as a global attribute in this file) and at https://pcmdi.llnl.gov/. The data producers and data providers make no warranty, either express or implied, including, but not limited to, warranties of merchantability and fitness for a particular purpose. All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law." which is not valid
! Valid values must match the regular expression:
!       ["^CMIP6 model data pr
!

@mauzey1
Copy link

mauzey1 commented Jul 29, 2022

@chengzhuzhang Would it be okay to change the license from the following

Creative Commons Attribution 4.0 International (CC BY 4.0; https://creativecommons.org/licenses/by/4.0/)

to this?

Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/)

Was this license working with the previous license template before the changes? I reverted my tables to before the license template changes, and was still getting the attribute "license" could not be validated error. Below is the previous template.

"license":[
    "^CMIP6 model data produced by .* is licensed under a Creative Commons Attribution.*ShareAlike 4.0 International License .https://creativecommons.org/licenses.* *Consult https://pcmdi.llnl.gov/CMIP6/TermsOfUse for terms of use governing CMIP6 output, including citation requirements and proper acknowledgment\\. *Further information about this data, including some limitations, can be found via the further_info_url (recorded as a global attribute in this file).*\\. *The data producers and data providers make no warranty, either express or implied, including, but not limited to, warranties of merchantability and fitness for a particular purpose\\. *All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law\\.$"
 ]

On a side note, that error message getting cut off is another issue with our current string length limit.

@chengzhuzhang
Copy link
Contributor Author

Hi @mauzey1 thanks for looking it over. By using Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) in the cmor input file. I can now generate E3SM 2-0 data! This is exciting.
Thanks a lot @mauzey1 @durack1 @matthew-mizielinski for coming up with a quick fix!

@TonyB9000
Copy link
Contributor

TonyB9000 commented Jul 29, 2022

Am I to understand that embedding the string "CC BY 4.0;" in the license string was the cause of a problem?

(It was the ONLY occurrence of a semicolon in the entire license string).

@mauzey1
Copy link

mauzey1 commented Jul 29, 2022

Here's the relevant part of the new license template

Creative Commons .* License (https://creativecommons\\.org/.*)\\.

Aside from needing to remove "CC BY 4.0; " it was also missing the word "License".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants