From 2574c99ddca6777f2c54408694acbe69e16fc49c Mon Sep 17 00:00:00 2001 From: Paola Petrelli Date: Thu, 30 May 2024 16:25:46 +1000 Subject: [PATCH 1/7] added publish-procedure fixed #83, #73 --- Governance/_toc.yml | 1 + Governance/concepts/other-conventions.md | 4 ++ Governance/introduction.md | 4 ++ Governance/publish/publish-csiro-dap.md | 5 -- Governance/publish/publish-options.md | 26 +++---- Governance/publish/publish-procedure.md | 91 ++++++++++++++++++++++++ Governance/tech/tech-intro.md | 3 +- requirements.txt | 2 +- 8 files changed, 116 insertions(+), 20 deletions(-) create mode 100644 Governance/publish/publish-procedure.md diff --git a/Governance/_toc.yml b/Governance/_toc.yml index e157e51..7515943 100644 --- a/Governance/_toc.yml +++ b/Governance/_toc.yml @@ -14,6 +14,7 @@ parts: - caption: Publishing climate data chapters: - file: publish/publish-intro + - file: publish/publish-procedure - file: publish/publish-options sections: - file: publish/publish-nci-geonetwork diff --git a/Governance/concepts/other-conventions.md b/Governance/concepts/other-conventions.md index dbb15ab..0470b8f 100644 --- a/Governance/concepts/other-conventions.md +++ b/Governance/concepts/other-conventions.md @@ -29,3 +29,7 @@ IMOS] has very specific extensions of the CF conventions for different kind of o * [AMBER Trajectory Conventions](http://ambermd.org/netcdf/nctraj.xhtml) for molecular dynamics simulations. * [CF Discrete Sampling Geometries Conventions](http://cfconventions.org/Data/cf-conventions/cf-conventions-1.6/build/cf-conventions.html\#discrete-sampling-geometries) - CF for observational and point data * [COMODO](https://web.archive.org/web/20160417032300/http://pycomodo.forge.imag.fr/norm.html) ??? still used? + +```{note} Potential clashes +Some of these conventions are a spinoff of the CF Conventions and so there's an expectation when applied that the files will also be CF compliant. However, as conventions are ever-evolving documents and the groups working on specific conventions are different, it is possible for them to introduce requirements that clash with the CF conventions. An example of this is the different uses for cf_role in UGRID and CF, UGRID requires values for this attributes which are not included in the values allowed by CF. As CF evolves it's possible that some of these alternatives will become just a use case of CF. In the mentioned example the clash should be resolved with CF V11 which should include an integration of UGRID into CF, see the relevant [CF github issue](https://github.com/cf-convention/cf-conventions/issues/501). +``` diff --git a/Governance/introduction.md b/Governance/introduction.md index 85a5b3c..ba9f9dc 100644 --- a/Governance/introduction.md +++ b/Governance/introduction.md @@ -20,3 +20,7 @@ As with all scientific outputs, errors and inconsistencies can be found in clima ### **[Retiring climate data](retire/retire-intro.md)** Data doesn't last forever, usually becoming outdated or obsolete within 5-10 years; this of course is simply the nature of scientific research. In this section, recommendations are presented on how to go about retiring a dataset, both published and replicated, without breaking citations, removing identifiers, or causing disruption to users, while retaining the value of your research data. + +### **[Creating climate data products](products/products-intro.md)** +Climate data is often used in other research fields, government initiatives and by private stakeholders for a variety of applications. +The process of adapting and packaging climate data so that it will be of use to a wider audience, with different backgrounds and/or for different purposes is more complex than simply sharing data with other climate researchers. At the moment we provide only an overview of what this section aims to cover. We welcome input and collaboration from people who have relevant experience or would like to propose use cases to cover. diff --git a/Governance/publish/publish-csiro-dap.md b/Governance/publish/publish-csiro-dap.md index 0e08452..7850e14 100644 --- a/Governance/publish/publish-csiro-dap.md +++ b/Governance/publish/publish-csiro-dap.md @@ -3,8 +3,3 @@ Data, software and links to external data holdings (such as NCI) can be added to the [CSIRO Data Access Portal](https://data.csiro.au/) (DAP) by **CSIRO staff** only. More information on creating metadata records and uploading files to the CSIRO DAP can be found on the [DAP Help Guide](https://confluence.csiro.au/display/dap/Deposit+and+Manage+Data) (staff only access). CSIRO-affiliated data can be published in the DAP and the lead creator does not have to be CSIRO staff member. - -!!!COMMENT: at the moment this is identical to what we have in the Publishing option pages for CSIRO. I'm leaving this here for the moment in case we want to add something more, like: - - summary of the procedure with pros and cons? - - Something which is not covered in the official documentation but could be useful? - - a reminder to follow discipline specific best practices even if they aren't necessarily required? diff --git a/Governance/publish/publish-options.md b/Governance/publish/publish-options.md index 9af392d..4ff3ee1 100644 --- a/Governance/publish/publish-options.md +++ b/Governance/publish/publish-options.md @@ -1,5 +1,5 @@ # Publishing options -Once the data is created and ready for publication, **where** can it be published? +Before starting preparing the data files for publication, it is important to individuate **where** it can be published. One of the main factors is what services are available, which is largely determined by the researcher's institution and/or organisation. Another important consideration is the kind of data, depending on format, size etc. @@ -7,21 +7,21 @@ In some cases, depending on the dataset, using a public repository is suitable a On the other end of the scale, some data could be produced on purpose or simply be suitable to contribute to a large-scale project that has its own publishing procedure established. The most common use cases for our community are covered here. -::::{tab-set} -:::{tab-item} BoM +````{tab-set} +```{tab-item} BoM Insert here pathways for BoM researchers... -::: +``` -:::{tab-item} CSIRO +```{tab-item} CSIRO Data, software and links to external data holdings (such as NCI) can be added to the [CSIRO Data Access Portal](https://data.csiro.au/) (DAP) by **CSIRO staff** only. More information on creating metadata records and uploading files to the CSIRO DAP can be found on the [DAP Help Guide](https://confluence.csiro.au/display/dap/Deposit+and+Manage+Data) (staff only access). CSIRO-affiliated data can be published in the DAP and the lead creator does not have to be CSIRO staff member. -::: +``` -:::{tab-item} CLEX +```{tab-item} CLEX We are looking here at CLEX as an example but as basically CLEX is only a collaboration project among universities, a lot of what applies here it is also usually applicable for anyone who works/studies in another university. Researchers working for a university have potentially more freedom in terms of where they can publish data. Unless they are working for a project which is covered by a data agreement and has specific licensing and/or data distribution requirements, the approach is to follow the FAIR principle and they are usually expected to share data openly. Part of making data FAIR is to make it discoverable, so sharing data in a discipline specific collection is to be preferred when possible. This could be a collection of climate data, or of a related discipline, i.e., paleoclimate data, oceanographic data, etc. @@ -37,9 +37,9 @@ An institutional repository might not be able to publish big datasets effectivel **CLEX Data Collection on Zenodo** For CLEX researchers, students and associates, the CMS team offers advice and can review a record, as well as add it to the [CLEX Data Collection community](https://zenodo.org/communities/arc-coe-clex-data/?page=1&size=20) to improve discoverability. For more information on Zenodo see the generic options tab. -::: +``` -:::{tab-item} Generalist repositories +```{tab-item} Generalist repositories Repositories like [Zenodo](https://zenodo.org), [Figshare](https://www.google.com/search?client=safari&rls=en&q=figshare&ie=UTF-8&oe=UTF-8), [Mendeley](https://www.data.mendeley.com) are public, generic data repositories. It is usually easy to create an account, add a data record and mint a DOI for it. These repositories also publish different kind of materials. This can be useful if publishing code together with data, for example code and data to produce a specific figure required to publish a paper. Another advantage is that these services are widely used and so you are more likely to reach an international audience. @@ -48,9 +48,9 @@ However, as they are generalist repositories, there are no standards required or Finally, as for institutional repositories, the data size is limited to 50-100 GB and files can only be downloaded. We are covering [Zenodo](publish-zenodo.md) more in detail as it is available to anyone and the most used in our community. Figshare is not free but it might be available via an institutional account. Mendeley is free but lees used for climate data. -::: +``` -:::{tab-item} Discipline repositories +```{tab-item} Discipline repositories In some cases, the data might be fit to be published to a specific data portal or as part of a larger initiative. Of these we are covering only the ESGF case more closely, as it provides an example of a comprehensive publishing process. For the others refer to their websites for more information or there might be some relevant examples at the end of this guidelines. @@ -70,6 +70,6 @@ Keep in mind that some of these options can be an extra distribution option for - [Copernicus Climate Data Store - CDS](https://cds.climate.copernicus.eu) - [Copernicus Climate Change Services - C3S](https://www.copernicus.eu/en/copernicus-services/climate-change)
Both Copernicus services work with a tender system, so they do not accept requests to publish datasets unless they fit into products they have an open tender for. -::: +``` -:::: +```` diff --git a/Governance/publish/publish-procedure.md b/Governance/publish/publish-procedure.md new file mode 100644 index 0000000..b15dcf1 --- /dev/null +++ b/Governance/publish/publish-procedure.md @@ -0,0 +1,91 @@ +# Publishing procedure + +While the exact procedure to publish a dataset will depend on where the data is published, there are common steps to prepare. + +````{grid} 1 1 1 3 +:class-container: text-center +:gutter: 3 + +```{grid-item-card} +:class-header: bg-light + +Step 1: Choices +^^^ +The first step is to make some important choices, which will depend on the characteristic of the data, the project that produced it and its most likely use. +``` + +```{grid-item-card} +:class-header: bg-light + +Step 2: Metadata +^^^ +Metadata includes any information on the data itself. In a published record the metadata is usually available as a record abstract and inside the files (if using a self-describing format as netCDF) or by auxiliary metadata files. +Ideally all this combined information should allow someone to reproduce the data from scratch (see [Provenance](../concepts/provenance)). +``` + +```{grid-item-card} +:class-header: bg-light + +Step 3: Formatting files +^^^ +Files should be well formatted to enhance their accessibility and usability. +``` +```` + +````{tab-set} +```{tab-item} Choices + :class-container: px-100 +**What** +It's not usual or a good thing to publish all the data produced in a research project. It's important to identify what data is useful or required, considered other limitations like storage availability. Here are some considerations that can help with the choice. +1. Provide the information needed to interpret, reuse and reproduce the results. This is what journal publishers usually require, most of them provides [guidelines and examples](journals) of which data should be shared. + +2. If the output is big publish only a subset. If methods are well described, the software used is easily available, then publishing only the subset of data that underlines a publication is sufficient. For example, the post-processed output is sufficient for a model simulation. However, the model version and configuration used, the input data and model source code should be documented. + +3. It's essential to consider an end user point of view. What would a user look for when considering using a dataset? What kind of information is essential for the data to be usable? Which additional information would make its use easier? + +4. The data required for publication might be a small part of the overall output. It often happens with model output that a researcher only uses a small subset of variables, but others could be useful to other projects. If it's possible these can be added to the publication and when time or resources to publish are scarce, providing information on their existence and on how to access them, it's often enough. This should include details on license, possible use restrictions and details on how to create accounts with other institutions if necessary. + +**Where** +There are different options to publish climate data, the most suitable for a research project will depend largely on the institution the researcher works for. This is explained in depth in the [next page](publish-options). It is worth to remember that while there should always be **only one DOI** per dataset, it is possible to create a metadata-only record pointing to the official DOI in other data portals to give more visibility to the data. + +**Licence** +This is often left last and in most cases this is not an issue, however, it is important to know from the start of the research what are the licensing terms of the data used as input and, if the rules around licensing potentially imposed by the employer, the project itself and funding bodies. Big collaborative projects, like CMIP6, usually have their set of rules around licensing. +[Licenses](../concepts/license) are covered extensively in the concepts part of this book. +It is also worth to consider adding **term of use** and a **disclaimer** to avoid the data being misused accidentally. + +**Version** +Often researchers think they don't need a version for a dataset as they are not planning to publish a second version. However, it is common to do mistakes when publishing data, no matter how small the issue the result is often having to publish an updated version as a DOI is a persistent identifier and changes to the underlying files are not permitted. This is why it is always recommended to have a version. [Versioning](../tech/versioning) is covered in the technical tips part of this book. + +**Authors and collaborators** +Another important choice is the authors and collaborators. As for a paper only people who contributed to the data (the data not the research project itself) should be listed as authors and where appropriate people who helped for example with the publishing process itself, formatting the data etc as collaborators. See the [authorship page](../concepts/authorship) in concepts. +``` + +```{tab-item} Metadata + +**Abstract** + +This ... + + +**Self-describing files** +Most of the file formats commonly used in climate are self-describing. An example are the attributes in a netCDF file. It is important to use these as much as possible to give a precise and correct definition of the data itself following available conventions. Using the self-describing properties of a file has two important advantages: it keeps important information with the data itself, and these files attributes are used by discipline software to simplify data analysis. +Publishers that deal regularly with climate data usually require at least for the [CF](../concepts/cf-conventions) and/or [ACDD](../concepts/acdd-conventions) conventions to be followed, however it is always worth to apply them when possible even if they are not required for publication. + +```{note} +Please note we cover CF conventions application and potential issues when they are used incorrectly in the [technical pages](../tech/conventions). +``` + +**Auxiliary files** +These could be any kind of text, tabular files or other formats like markup (xml, html, json etc.) that add information on the data. These could also be actual data files, for example the ancillary files used to run a model simulation. +``` + +```{tab-item} Formatting files + +**Files organisation** +The way files are organised in folders, their names and sizes should consider both how the files will be distributed and how they will be used. For example, the protocol used to download the files might have a maximum size allowed, likewise having to list a lot of small files it's inefficient when loading an html page. Names also should be descriptive enough that a file can be recognised easily after has been downloaded as being part of a specific dataset. We covered [files organisation and naming](../tech/drs-names) in the technical pages of this book, however, it is important to check the publisher instructions in this regard, or, if none are available online, contacting them about it as early as possible in the publishing process. + +**Conventions** +It is important to use [conventions](../concepts/conventions) and [controlled vocabularies](../concepts/controlled-vocab) whenever possible, both official ones, like CF conventions for file attributes, and others which are not a requirement but have become common practice in the climate community (e.g. CMIP variable names). As some of these conventions also apply to folder and file names, it is important to be consistent and use the same terms in the files, names and descriptions. + +``` +```` diff --git a/Governance/tech/tech-intro.md b/Governance/tech/tech-intro.md index 6065073..05fc2b6 100644 --- a/Governance/tech/tech-intro.md +++ b/Governance/tech/tech-intro.md @@ -6,7 +6,8 @@ We are collecting here information on technical aspects of data management, incl * [](backup-checklist.md) * [](cf-checker.md) -* [](drs.md) +* [](contributors.md) +* [](drs-names.md) * [](keywords.md) * [](coding.md) * [](data_formats.md) diff --git a/requirements.txt b/requirements.txt index 38b87bc..e003153 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,4 +1,4 @@ -jupyter-book == 0.11.* +jupyter-book matplotlib numpy ghp-import From 592745df4dc81ea15c068ebba63dfeeb8bed1add Mon Sep 17 00:00:00 2001 From: Paola Petrelli Date: Wed, 5 Jun 2024 17:02:17 +1000 Subject: [PATCH 2/7] added tar cheatsheet fixed crossreferences in publish section issues #92, #90, progress on publish-procedure --- Governance/concepts/concept-intro.md | 3 +- Governance/manage/manage-file.md | 2 +- Governance/publish/publish-nci-geonetwork.md | 2 +- Governance/publish/publish-procedure.md | 22 +++-- Governance/publish/publish-zenodo.md | 2 +- Governance/tech/massdata.md | 85 +++++++++++++++++++- 6 files changed, 101 insertions(+), 15 deletions(-) diff --git a/Governance/concepts/concept-intro.md b/Governance/concepts/concept-intro.md index 3f9ca57..00a108e 100644 --- a/Governance/concepts/concept-intro.md +++ b/Governance/concepts/concept-intro.md @@ -4,6 +4,7 @@ In this section of the book we are covering the key concepts associated with dat **Index** +* [](authorship.md) * [](backup.md) * [Controlled vocabulary](controlled-vocab.md) * [Conventions and standards](conventions.md) @@ -11,7 +12,7 @@ In this section of the book we are covering the key concepts associated with dat * [](collaboration-agreement.md) * [](dmp.md) * [](policies.md) -* [FAIRER data](fairer-principles.md) +* [FAIRER principles](fairer-principles.md) * [Journal requirements](journal.md) * [Open Access Licenses](license.md) * [Persistent identifiers](pids.md) diff --git a/Governance/manage/manage-file.md b/Governance/manage/manage-file.md index 45a7c83..06dc332 100644 --- a/Governance/manage/manage-file.md +++ b/Governance/manage/manage-file.md @@ -31,7 +31,7 @@ Always have the code under version control! ## File and directory organisation -While all the generic advice on [how to organise and name files](../tech/drs.md) is still applicable, when replicating a dataset it is important to also consider the original data organisation. +While all the generic advice on [how to organise and name files](../tech/drs-names.md) is still applicable, when replicating a dataset it is important to also consider the original data organisation. ### Naming files diff --git a/Governance/publish/publish-nci-geonetwork.md b/Governance/publish/publish-nci-geonetwork.md index e68f90a..812388b 100644 --- a/Governance/publish/publish-nci-geonetwork.md +++ b/Governance/publish/publish-nci-geonetwork.md @@ -22,7 +22,7 @@ These pages are only visible to interested parties, therefore we provide an [exa Once the DMP is ready NCI will use the content to create a geonetwork record and mint a DOI for the new dataset. The GeoNetwork record will provide the landing page for the DOI and will be visible only once the files are available on THREDDS. ### Preparing the files -The actual files have to be organised in a , this will contain the license and a readme file (usually pointing to the geonetwork record) and a sub-folder for each version containing the data files. How the data is organised will depend on the actual dataset, see the [DRS page](../tech/drs.md) for examples. +The actual files have to be organised in a , this will contain the license and a readme file (usually pointing to the geonetwork record) and a sub-folder for each version containing the data files. How the data is organised will depend on the actual dataset, see the [DRS page](../tech/drs-names.md) for examples. The files should follow both [CF](../concepts/cf-conventions.md) and [ACDD](../concepts/acdd-conventions.md) conventions. Once the files are ready, NCI will run a QC check, CF/ACDD compliance check, and that the files are accessible by widely used software like ncview, nco etc. If the files passed the tests, then they will add the dataset to THREDDS and activate the DOI. If not, they will send a detailed report of the QC results so the files can be fixed where possible. diff --git a/Governance/publish/publish-procedure.md b/Governance/publish/publish-procedure.md index b15dcf1..73e7e1c 100644 --- a/Governance/publish/publish-procedure.md +++ b/Governance/publish/publish-procedure.md @@ -32,10 +32,11 @@ Files should be well formatted to enhance their accessibility and usability. ``` ```` -````{tab-set} -```{tab-item} Choices +`````{tab-set} +````{tab-item} Choices :class-container: px-100 **What** + It's not usual or a good thing to publish all the data produced in a research project. It's important to identify what data is useful or required, considered other limitations like storage availability. Here are some considerations that can help with the choice. 1. Provide the information needed to interpret, reuse and reproduce the results. This is what journal publishers usually require, most of them provides [guidelines and examples](journals) of which data should be shared. @@ -46,21 +47,25 @@ It's not usual or a good thing to publish all the data produced in a research pr 4. The data required for publication might be a small part of the overall output. It often happens with model output that a researcher only uses a small subset of variables, but others could be useful to other projects. If it's possible these can be added to the publication and when time or resources to publish are scarce, providing information on their existence and on how to access them, it's often enough. This should include details on license, possible use restrictions and details on how to create accounts with other institutions if necessary. **Where** + There are different options to publish climate data, the most suitable for a research project will depend largely on the institution the researcher works for. This is explained in depth in the [next page](publish-options). It is worth to remember that while there should always be **only one DOI** per dataset, it is possible to create a metadata-only record pointing to the official DOI in other data portals to give more visibility to the data. **Licence** + This is often left last and in most cases this is not an issue, however, it is important to know from the start of the research what are the licensing terms of the data used as input and, if the rules around licensing potentially imposed by the employer, the project itself and funding bodies. Big collaborative projects, like CMIP6, usually have their set of rules around licensing. [Licenses](../concepts/license) are covered extensively in the concepts part of this book. It is also worth to consider adding **term of use** and a **disclaimer** to avoid the data being misused accidentally. **Version** + Often researchers think they don't need a version for a dataset as they are not planning to publish a second version. However, it is common to do mistakes when publishing data, no matter how small the issue the result is often having to publish an updated version as a DOI is a persistent identifier and changes to the underlying files are not permitted. This is why it is always recommended to have a version. [Versioning](../tech/versioning) is covered in the technical tips part of this book. **Authors and collaborators** + Another important choice is the authors and collaborators. As for a paper only people who contributed to the data (the data not the research project itself) should be listed as authors and where appropriate people who helped for example with the publishing process itself, formatting the data etc as collaborators. See the [authorship page](../concepts/authorship) in concepts. -``` +```` -```{tab-item} Metadata +````{tab-item} Metadata **Abstract** @@ -76,10 +81,11 @@ Please note we cover CF conventions application and potential issues when they a ``` **Auxiliary files** + These could be any kind of text, tabular files or other formats like markup (xml, html, json etc.) that add information on the data. These could also be actual data files, for example the ancillary files used to run a model simulation. -``` +```` -```{tab-item} Formatting files +````{tab-item} Formatting files **Files organisation** The way files are organised in folders, their names and sizes should consider both how the files will be distributed and how they will be used. For example, the protocol used to download the files might have a maximum size allowed, likewise having to list a lot of small files it's inefficient when loading an html page. Names also should be descriptive enough that a file can be recognised easily after has been downloaded as being part of a specific dataset. We covered [files organisation and naming](../tech/drs-names) in the technical pages of this book, however, it is important to check the publisher instructions in this regard, or, if none are available online, contacting them about it as early as possible in the publishing process. @@ -87,5 +93,5 @@ The way files are organised in folders, their names and sizes should consider bo **Conventions** It is important to use [conventions](../concepts/conventions) and [controlled vocabularies](../concepts/controlled-vocab) whenever possible, both official ones, like CF conventions for file attributes, and others which are not a requirement but have become common practice in the climate community (e.g. CMIP variable names). As some of these conventions also apply to folder and file names, it is important to be consistent and use the same terms in the files, names and descriptions. -``` -```` +```` +````` diff --git a/Governance/publish/publish-zenodo.md b/Governance/publish/publish-zenodo.md index 1795c86..d290655 100644 --- a/Governance/publish/publish-zenodo.md +++ b/Governance/publish/publish-zenodo.md @@ -14,7 +14,7 @@ Publishing a dataset is easy and quick as long as the data is reasonably organis * A dataset can have several authors, they all should agree to the dataset publication and to list the record on Zenodo. All authors should have made a significant contribution to the data. * Make sure the files: * are following any relevant standards, if they are netcdf files they should follow both [CF](../concepts/cf-conventions) and [ACDD](../concepts/acdd-conventions.md) conventions. - * have [descriptive names](../tech/filenames.md) and are organised in [directories](../tech/drs.md) in a way that facilitate their access and use; + * have [descriptive names and are organised in directories](../tech/drs-names.md) in a way that facilitate their access and use; * have a [version](versioning-data) and a [license](../concepts/license-data.md) assigned. * Use [keywords and controlled vocabularies](../concepts/controlled-vocab.md) in the metadata to increase discoverability. * If the data has already been published elsewhere, a record on Zenodo can still be added to improve visibility. In such cases, use the existing DOI, **do not create a new DOI for the same data**. Instead of uploading the actual data files, a Readme file can be uploaded and links to the original records added for data download. diff --git a/Governance/tech/massdata.md b/Governance/tech/massdata.md index 3a90195..bb21856 100644 --- a/Governance/tech/massdata.md +++ b/Governance/tech/massdata.md @@ -20,11 +20,90 @@ While preparing the data to be moved, it is a good idea to also document what da Useful tools: -* TAR- to create archives cheatsheet -````{dropdown} +````{dropdown} **TAR - Tape ARchive cheat sheet** +**Options** +* c – create archive file. +* u - update archive file. +* x – extract a archive file. +* v – show the progress of archive file. +* f – filename of archive file. +* t – viewing content of archive file. +* j – filter archive through bzip2. +* z – filter archive through gzip. +* r – append or update files or directories to existing archive file. +* p - preserve-permissions +* --acls - preserve acls + +**Create tar archive** + + tar -cvf archive.tar testdir + +**Create compressed tar archive** + + tar -cvzf archive.tar.gz testdir + +Or for more compression but slower writing/uncompressing + + tar -cvfj archive.tar.bz2 testdir + +**Exclude only files or directories with pattern** + + tar -cvf archive.tar --exclude=”*.txt” testdir + +There are many exclude options check with: `man tar` + +**Update tar archive** + + tar -uvf archive.tar testdir + +This will add previously excluded files and update any file which has changed. + +**Untar archive** + + tar -xvf archive.tar + tar -xvf archive.tar.gz + tar -xvf archive.tar.bz2 + tar -xvf archive.tar -C /home/uncompress/here + +**List archive content** + + tar -tvf archive.tar + +**Extract one file from tar archive** + +Need to use the full path for the file. As an example, if archive was created with:
+ tar -cvf archive.tar testdir + +You have to specify:
+ tar -xf archive.tar testdir/readme.txt + +**Extract multiple files from tar archive** + + tar -xvf archive.tar readme.txt another.txt + +Or using wildcards: + + tar -xvf archive.tar *.txt + +**Add files or directories to tar archive** + + tar -rvf archive.tar readme.txt + tar -rvf archive.tar anotherdir + +**Delete files or directories from tar archive** + + tar --delete -f archive.tar anotherdir + +**Find differences between archive and local directory** + + tar -d archive.tar testdir + +NB. this can take a long time + +Modified from: https://neverendingsecurity.wordpress.com/2015/04/13/linux-tar-commands-cheatsheet/ ```` -* Compressing tools +For advice on compressing files see the [relevant page](compression.md) ## Accessing MDSS From daf713b33c675906dd7fe01109d5315a25b3ae3f Mon Sep 17 00:00:00 2001 From: Paola Petrelli Date: Thu, 6 Jun 2024 08:50:14 +1000 Subject: [PATCH 3/7] Update Governance/publish/publish-procedure.md Co-authored-by: Claire Trenham --- Governance/publish/publish-procedure.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Governance/publish/publish-procedure.md b/Governance/publish/publish-procedure.md index 73e7e1c..64861fe 100644 --- a/Governance/publish/publish-procedure.md +++ b/Governance/publish/publish-procedure.md @@ -62,7 +62,7 @@ Often researchers think they don't need a version for a dataset as they are not **Authors and collaborators** -Another important choice is the authors and collaborators. As for a paper only people who contributed to the data (the data not the research project itself) should be listed as authors and where appropriate people who helped for example with the publishing process itself, formatting the data etc as collaborators. See the [authorship page](../concepts/authorship) in concepts. +Like journal manuscripts, datasets (and software) have an authorship list and collaborators. As for a paper, only people who contributed to the dataset (the data not the research project itself) should be listed as authors and where appropriate people who helped for example with the publishing process itself, formatting the data etc as collaborators. See the [authorship page](../concepts/authorship) in concepts. ```` ````{tab-item} Metadata From 95ee0e90860e7a46c29b1a7f9137c18c22b48195 Mon Sep 17 00:00:00 2001 From: Paola Petrelli Date: Thu, 6 Jun 2024 08:54:27 +1000 Subject: [PATCH 4/7] Update Governance/publish/publish-procedure.md Co-authored-by: Claire Trenham --- Governance/publish/publish-procedure.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Governance/publish/publish-procedure.md b/Governance/publish/publish-procedure.md index 64861fe..85b44c0 100644 --- a/Governance/publish/publish-procedure.md +++ b/Governance/publish/publish-procedure.md @@ -44,7 +44,7 @@ It's not usual or a good thing to publish all the data produced in a research pr 3. It's essential to consider an end user point of view. What would a user look for when considering using a dataset? What kind of information is essential for the data to be usable? Which additional information would make its use easier? -4. The data required for publication might be a small part of the overall output. It often happens with model output that a researcher only uses a small subset of variables, but others could be useful to other projects. If it's possible these can be added to the publication and when time or resources to publish are scarce, providing information on their existence and on how to access them, it's often enough. This should include details on license, possible use restrictions and details on how to create accounts with other institutions if necessary. +4. The data required for publication might be a small part of the overall output. It often happens with model output that a researcher only uses a small subset of variables, but others could be useful to other projects. If it is viable these can be added to the publication, but when resources to publish are scarce, providing information on their existence and on how to access them, may be enough. This should include details on license, possible use restrictions and details on how to create accounts with other institutions if necessary. **Where** From 03a0d01477f59b5218742d5e9a360e995762dfa7 Mon Sep 17 00:00:00 2001 From: Paola Petrelli Date: Thu, 6 Jun 2024 09:12:03 +1000 Subject: [PATCH 5/7] Update Governance/publish/publish-procedure.md Co-authored-by: Claire Trenham --- Governance/publish/publish-procedure.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Governance/publish/publish-procedure.md b/Governance/publish/publish-procedure.md index 85b44c0..7f3d2cb 100644 --- a/Governance/publish/publish-procedure.md +++ b/Governance/publish/publish-procedure.md @@ -40,7 +40,7 @@ Files should be well formatted to enhance their accessibility and usability. It's not usual or a good thing to publish all the data produced in a research project. It's important to identify what data is useful or required, considered other limitations like storage availability. Here are some considerations that can help with the choice. 1. Provide the information needed to interpret, reuse and reproduce the results. This is what journal publishers usually require, most of them provides [guidelines and examples](journals) of which data should be shared. -2. If the output is big publish only a subset. If methods are well described, the software used is easily available, then publishing only the subset of data that underlines a publication is sufficient. For example, the post-processed output is sufficient for a model simulation. However, the model version and configuration used, the input data and model source code should be documented. +2. If the output is big publish only a subset. If methods are well described, the software used is easily available, then publishing only the subset of data that underlines a publication is sufficient. For example, the post-processed output is sufficient for a model simulation. However, the model version and configuration used, the input data and model source code should be clearly documented. 3. It's essential to consider an end user point of view. What would a user look for when considering using a dataset? What kind of information is essential for the data to be usable? Which additional information would make its use easier? From f1115043c7fb66c2efb7f4e2dcdd70fcc3067183 Mon Sep 17 00:00:00 2001 From: Paola Petrelli Date: Thu, 6 Jun 2024 09:19:17 +1000 Subject: [PATCH 6/7] Apply suggestions from code review Co-authored-by: Claire Trenham --- Governance/publish/publish-procedure.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/Governance/publish/publish-procedure.md b/Governance/publish/publish-procedure.md index 7f3d2cb..fa6d287 100644 --- a/Governance/publish/publish-procedure.md +++ b/Governance/publish/publish-procedure.md @@ -19,7 +19,7 @@ The first step is to make some important choices, which will depend on the chara Step 2: Metadata ^^^ -Metadata includes any information on the data itself. In a published record the metadata is usually available as a record abstract and inside the files (if using a self-describing format as netCDF) or by auxiliary metadata files. +Metadata includes any information on the data itself. In a published record the metadata is usually available as a record abstract and inside the files (if using a self-describing format such as netCDF) or by auxiliary metadata files. Ideally all this combined information should allow someone to reproduce the data from scratch (see [Provenance](../concepts/provenance)). ``` @@ -48,11 +48,11 @@ It's not usual or a good thing to publish all the data produced in a research pr **Where** -There are different options to publish climate data, the most suitable for a research project will depend largely on the institution the researcher works for. This is explained in depth in the [next page](publish-options). It is worth to remember that while there should always be **only one DOI** per dataset, it is possible to create a metadata-only record pointing to the official DOI in other data portals to give more visibility to the data. +There are different options to publish climate data, the most suitable for a research project will depend largely on the institution the researcher works for. This is explained in depth in the [next page](publish-options). It is worth remembering that while there should always be **only one DOI** per dataset, it is possible to create a metadata-only record pointing to the official DOI in other data portals to give more visibility to the data. **Licence** -This is often left last and in most cases this is not an issue, however, it is important to know from the start of the research what are the licensing terms of the data used as input and, if the rules around licensing potentially imposed by the employer, the project itself and funding bodies. Big collaborative projects, like CMIP6, usually have their set of rules around licensing. +This is often left last and in most cases this is not an issue, however, it is important to know **from the start** of the research what are the *licensing terms of the data used as input* and, if the rules around licensing potentially imposed by the employer, the project itself and funding bodies. Big collaborative projects, like CMIP6, usually have their set of rules around licensing. [Licenses](../concepts/license) are covered extensively in the concepts part of this book. It is also worth to consider adding **term of use** and a **disclaimer** to avoid the data being misused accidentally. @@ -74,7 +74,7 @@ This ... **Self-describing files** Most of the file formats commonly used in climate are self-describing. An example are the attributes in a netCDF file. It is important to use these as much as possible to give a precise and correct definition of the data itself following available conventions. Using the self-describing properties of a file has two important advantages: it keeps important information with the data itself, and these files attributes are used by discipline software to simplify data analysis. -Publishers that deal regularly with climate data usually require at least for the [CF](../concepts/cf-conventions) and/or [ACDD](../concepts/acdd-conventions) conventions to be followed, however it is always worth to apply them when possible even if they are not required for publication. +Publishers that deal regularly with climate data usually require at least for the [CF](../concepts/cf-conventions) and/or [ACDD](../concepts/acdd-conventions) conventions to be followed, however it is always worth applying them when possible even if they are not required for publication. ```{note} Please note we cover CF conventions application and potential issues when they are used incorrectly in the [technical pages](../tech/conventions). @@ -82,7 +82,7 @@ Please note we cover CF conventions application and potential issues when they a **Auxiliary files** -These could be any kind of text, tabular files or other formats like markup (xml, html, json etc.) that add information on the data. These could also be actual data files, for example the ancillary files used to run a model simulation. +These could be any kind of text, tabular files or other formats like markup (xml, html, json, md etc.) that add information on the data. These could also be actual data files, for example the ancillary files used to run a model simulation. ```` ````{tab-item} Formatting files From f2addbbe019c991a52da16616ad2575096a075f5 Mon Sep 17 00:00:00 2001 From: Paola Petrelli Date: Thu, 6 Jun 2024 14:59:24 +1000 Subject: [PATCH 7/7] further updates in response to review, fix formatting for tech-readme --- Governance/concepts/other-conventions.md | 2 +- Governance/publish/publish-csiro-dap.md | 2 +- Governance/publish/publish-options.md | 2 +- Governance/publish/publish-procedure.md | 14 +- Governance/tech/tech-readme.md | 176 +++++++++++++++-------- 5 files changed, 132 insertions(+), 64 deletions(-) diff --git a/Governance/concepts/other-conventions.md b/Governance/concepts/other-conventions.md index 0470b8f..ec09f40 100644 --- a/Governance/concepts/other-conventions.md +++ b/Governance/concepts/other-conventions.md @@ -31,5 +31,5 @@ IMOS] has very specific extensions of the CF conventions for different kind of o * [COMODO](https://web.archive.org/web/20160417032300/http://pycomodo.forge.imag.fr/norm.html) ??? still used? ```{note} Potential clashes -Some of these conventions are a spinoff of the CF Conventions and so there's an expectation when applied that the files will also be CF compliant. However, as conventions are ever-evolving documents and the groups working on specific conventions are different, it is possible for them to introduce requirements that clash with the CF conventions. An example of this is the different uses for cf_role in UGRID and CF, UGRID requires values for this attributes which are not included in the values allowed by CF. As CF evolves it's possible that some of these alternatives will become just a use case of CF. In the mentioned example the clash should be resolved with CF V11 which should include an integration of UGRID into CF, see the relevant [CF github issue](https://github.com/cf-convention/cf-conventions/issues/501). +Some of these conventions are a spinoff of the CF Conventions and so there's an expectation when applied that the files will also be CF compliant. However, as conventions are ever-evolving documents and the groups working on specific conventions are different, it is possible for them to introduce requirements that clash with the CF conventions. An example of this is the different uses for cf_role in UGRID and CF, UGRID requires values for this attributes which are not included in the values allowed by CF. As CF evolves it's possible that some of these alternatives will become just a use case of CF. In the mentioned example the clash should be resolved with CF v1.11 which should include an integration of UGRID into CF, see the relevant [CF github issue](https://github.com/cf-convention/cf-conventions/issues/501). ``` diff --git a/Governance/publish/publish-csiro-dap.md b/Governance/publish/publish-csiro-dap.md index 7850e14..acd6db4 100644 --- a/Governance/publish/publish-csiro-dap.md +++ b/Governance/publish/publish-csiro-dap.md @@ -1,5 +1,5 @@ # CSIRO Data Access Portal (DAP) -Data, software and links to external data holdings (such as NCI) can be added to the [CSIRO Data Access Portal](https://data.csiro.au/) (DAP) by **CSIRO staff** only. More information on creating metadata records and uploading files to the CSIRO DAP can be found on the [DAP Help Guide](https://confluence.csiro.au/display/dap/Deposit+and+Manage+Data) (staff only access). +Data, software and links to external data holdings (such as NCI) can be added to the [CSIRO Data Access Portal](https://research.csiro.au/dap/) (DAP) by **CSIRO staff** only. More information on creating metadata records and uploading files to the CSIRO DAP can be found on the [DAP Help Guide](https://confluence.csiro.au/display/dap/Publish%2C+Archive%2C+and+Manage+Research+Data+and+Software) (staff only access). CSIRO-affiliated data can be published in the DAP and the lead creator does not have to be CSIRO staff member. diff --git a/Governance/publish/publish-options.md b/Governance/publish/publish-options.md index 4ff3ee1..8f0afe8 100644 --- a/Governance/publish/publish-options.md +++ b/Governance/publish/publish-options.md @@ -16,7 +16,7 @@ Insert here pathways for BoM researchers... ``` ```{tab-item} CSIRO -Data, software and links to external data holdings (such as NCI) can be added to the [CSIRO Data Access Portal](https://data.csiro.au/) (DAP) by **CSIRO staff** only. More information on creating metadata records and uploading files to the CSIRO DAP can be found on the [DAP Help Guide](https://confluence.csiro.au/display/dap/Deposit+and+Manage+Data) (staff only access). +Data, software and links to external data holdings (such as NCI) can be added to the [CSIRO Data Access Portal](https://research.csiro.au/dap/) (DAP) by **CSIRO staff** only. More information on creating metadata records and uploading files to the CSIRO DAP can be found on the [DAP Help Guide](https://confluence.csiro.au/display/dap/Publish%2C+Archive%2C+and+Manage+Research+Data+and+Software) (staff only access). CSIRO-affiliated data can be published in the DAP and the lead creator does not have to be CSIRO staff member. ``` diff --git a/Governance/publish/publish-procedure.md b/Governance/publish/publish-procedure.md index 73e7e1c..b7797f3 100644 --- a/Governance/publish/publish-procedure.md +++ b/Governance/publish/publish-procedure.md @@ -42,7 +42,7 @@ It's not usual or a good thing to publish all the data produced in a research pr 2. If the output is big publish only a subset. If methods are well described, the software used is easily available, then publishing only the subset of data that underlines a publication is sufficient. For example, the post-processed output is sufficient for a model simulation. However, the model version and configuration used, the input data and model source code should be documented. -3. It's essential to consider an end user point of view. What would a user look for when considering using a dataset? What kind of information is essential for the data to be usable? Which additional information would make its use easier? +3. Consider the strengths and limitations of the data, the way it was produced can limit what it should be used for. Model output is a good example of this, the way physical processes are parametrised, the resolution, the input data and other model components determine what the output data should or shouldn't be used for. 4. The data required for publication might be a small part of the overall output. It often happens with model output that a researcher only uses a small subset of variables, but others could be useful to other projects. If it's possible these can be added to the publication and when time or resources to publish are scarce, providing information on their existence and on how to access them, it's often enough. This should include details on license, possible use restrictions and details on how to create accounts with other institutions if necessary. @@ -62,15 +62,18 @@ Often researchers think they don't need a version for a dataset as they are not **Authors and collaborators** -Another important choice is the authors and collaborators. As for a paper only people who contributed to the data (the data not the research project itself) should be listed as authors and where appropriate people who helped for example with the publishing process itself, formatting the data etc as collaborators. See the [authorship page](../concepts/authorship) in concepts. +Another important choice is the authors and collaborators. As for a paper only people who contributed to the data (the data not the research project itself) should be listed as authors and, where appropriate, people who helped with the publishing process itself, for example formatting the data, as [collaborators](../tech/contributors). See also the [authorship page](../concepts/authorship) in concepts. ```` ````{tab-item} Metadata -**Abstract** +**Dataset webpage** -This ... +The dataset page on the data portal provides the first data information for a prospective user. While there are differences among how this information is displayed, most of the elements are common to all portals. +A page will show a **title** and an **abstract**, it is important for both to be written considering an end user point of view. What would a user look for when considering using a dataset? What kind of information is essential for the data to be usable? Which additional information would make its use easier? +It's important to consider how the search engine of the portal works and how it displays results. Choosing relevant [keywords](../tech/keywords) and using terms from [controlled vocabularies](../concepts/controlled-vocab) helps. Words in the title are also prioritised by search engines and shown as part of the results list, so the [title should be as descriptive](../tech/title) as possible. +The abstract should use plain language and not assumed expertise of the field on a user side. For the same reason it is important to give the context in which the data was produced and highlight its appropriate uses. If there is a high risk for the data to be misused, then a disclaimer explaining the data limitations should be added too. While it’s important to share technical details to use the data correctly, it's worth considering adding this information in a separate file if it is extensive. We provide an [annoted example of a readme file](../tech/tech-readme) which show what kind of information should be considered. Data portals often have options to add related links, this is an easy way to provide as much relevant information as possible. **Self-describing files** Most of the file formats commonly used in climate are self-describing. An example are the attributes in a netCDF file. It is important to use these as much as possible to give a precise and correct definition of the data itself following available conventions. Using the self-describing properties of a file has two important advantages: it keeps important information with the data itself, and these files attributes are used by discipline software to simplify data analysis. @@ -80,6 +83,7 @@ Publishers that deal regularly with climate data usually require at least for th Please note we cover CF conventions application and potential issues when they are used incorrectly in the [technical pages](../tech/conventions). ``` + **Auxiliary files** These could be any kind of text, tabular files or other formats like markup (xml, html, json etc.) that add information on the data. These could also be actual data files, for example the ancillary files used to run a model simulation. @@ -88,9 +92,11 @@ These could be any kind of text, tabular files or other formats like markup (xml ````{tab-item} Formatting files **Files organisation** + The way files are organised in folders, their names and sizes should consider both how the files will be distributed and how they will be used. For example, the protocol used to download the files might have a maximum size allowed, likewise having to list a lot of small files it's inefficient when loading an html page. Names also should be descriptive enough that a file can be recognised easily after has been downloaded as being part of a specific dataset. We covered [files organisation and naming](../tech/drs-names) in the technical pages of this book, however, it is important to check the publisher instructions in this regard, or, if none are available online, contacting them about it as early as possible in the publishing process. **Conventions** + It is important to use [conventions](../concepts/conventions) and [controlled vocabularies](../concepts/controlled-vocab) whenever possible, both official ones, like CF conventions for file attributes, and others which are not a requirement but have become common practice in the climate community (e.g. CMIP variable names). As some of these conventions also apply to folder and file names, it is important to be consistent and use the same terms in the files, names and descriptions. ```` diff --git a/Governance/tech/tech-readme.md b/Governance/tech/tech-readme.md index 96a6337..9832248 100644 --- a/Governance/tech/tech-readme.md +++ b/Governance/tech/tech-readme.md @@ -1,47 +1,72 @@ -# Dataset readme file template +# Annotated README example -A good readme file attached to your published data is a useful tool for a potential user. If they have found the data online by downloading the readme file they can keep the data description and publications details together with the data. If they are using the data directly on the server where the data is stored, then they can get the information on the dataset without having to go online. +A good README file attached to published data is a useful tool for a potential user. If they have found the data online by downloading the README file they can keep the data description and publications details together with the data. If they are using the data directly on the server where the data is stored, then they can get the information on the dataset without having to go online. -More importantly the data description shown on an online repository is limited in size and a readme file is useful to add more technical details, directory structure and other information that you might have to leave out of the main abstract. +More importantly the data description shown on an online repository is limited in size and a README file is useful to add more technical details, directory structure and other information that you might have to leave out of the main abstract. -**Readme template and example** +This is an example from a record published with NCI and it's used here to illustrate how to structure a README file. This particular README file is quite long as there are a lot of technical details, so it provides a good example to show the kind of information that can be included. However, if a dataset is less complex, it is likely the README file will be a lot shorter than this. -When you are publishing on NCI we provide a Readme file template in the working directory we create for each dataset. here I am using a real readme file from a record we recently published to illustrate the template structure. This particular readme file is quite long as there are a lot of technical details, so it provides a good example to show the kind of information you can include. However, if your dataset is less complex, it is likely your readme file will be a lot shorter than this. +* Dataset title and version - +```{code} +High-Resolution Modelling of Extreme Storms over the East Coast of Australia v1.0 +``` -**High-Resolution Modelling of Extreme Storms over the East Coast of Australia v1.0** - +* Abstract usually the same as published record -This dataset includes data of 11 extreme East Coast Lows simulated using a multi-physics (5), multi-resolution (3) approach for 4 different boundary conditions, leading to a total of 660 simulations. All simulations were performed using the Weather Research and Forecasting (WRF v3.6) regional model using a triple nesting approach with different domain sizes. -The outer domain corresponds to the CORDEX (Coordinated Regional Climate Downscaling Experiment) Australasia domain and is discretized using a 24 km horizontal grid spacing. -ECMWF ERA Interim (ERAI) reanalysis are used to initialize and drive the model for present climate simulations. For future climate simulations, the Pseudo Global Warming (PGW) approach is used. In this case, initial and boundary conditions are built by adding the climate change signal obtained from a global climate multi-model ensemble from the CMIP5. -We selected a total of eleven events featuring an extreme east coast low over the eastern coast of Australia. Events were selected based on their impacts around the Sydney area and the selection includes some of the most iconic events in recent times such as the "Pasha Bulker" (June 2007) and the June 2016 storms. All events were simulated for a total of 8 days, starting about 4 days before the storm peaked near the Sydney area. A full list of events dates is given below. - +```{code} +This dataset includes data of 11 extreme East Coast Lows simulated using a multi-physics (5), +multi-resolution (3) approach for 4 different boundary conditions, leading to a total of 660 +simulations. All simulations were performed using the Weather Research and Forecasting (WRF +v3.6) regional model using a triple nesting approach with different domain sizes. +The outer domain corresponds to the CORDEX (Coordinated Regional Climate Downscaling Experiment) +Australasia domain and is discretized using a 24 km horizontal grid spacing. +ECMWF ERA Interim (ERAI) reanalysis are used to initialize and drive the model for present +climate simulations. For future climate simulations, the Pseudo Global Warming (PGW) approach is +used. In this case, initial and boundary conditions are built by adding the climate change signal +obtained from a global climate multi-model ensemble from the CMIP5. +We selected a total of eleven events featuring an extreme east coast low over the eastern coast of +Australia. Events were selected based on their impacts around the Sydney area and the selection +includes some of the most iconic events in recent times such as the "Pasha Bulker" (June 2007) and +the June 2016 storms. All events were simulated for a total of 8 days, starting about 4 days before +the storm peaked near the Sydney area. A full list of events dates is given below. +``` +* List of relevant elements of the data as simulations, data sources etc. + +```{code} Events: -2007-06-04 to 2007-06-12 -2007-06-13 to 2007-06-21 -2007-06-22 to 2007-06-30 -2001-07-23 to 2001-07-31 -2005-03-18 to 2005-03-26 -2008-09-02 to 2008-9-10 -2015-04-18 to 2015-04-26 -2008-08-18 to 2008-08-26 -2013-02-17 to 2013-02-25 -2016-06-01 to 2016-06-09 -2006-09-03 to 2006-09-11 + 2007-06-04 to 2007-06-12 + 2007-06-13 to 2007-06-21 + 2007-06-22 to 2007-06-30 + 2001-07-23 to 2001-07-31 + 2005-03-18 to 2005-03-26 + 2008-09-02 to 2008-9-10 + 2015-04-18 to 2015-04-26 + 2008-08-18 to 2008-08-26 + 2013-02-17 to 2013-02-25 + 2016-06-01 to 2016-06-09 + 2006-09-03 to 2006-09-11 Boundary conditions: - HIST - (ERAI) reanalysis is used as lateral and surface (SST) boundary conditions for present climate simulations - HIST-BRAN - (ERAI) reanalysis is used as lateral boundary conditions and BRAN for surface (SST), for present climate simulations - FUT - initial and boundary conditions are built by adding the climate change signal from the CMIP5 RCP8.5 scenario climate multi-model ensemble to ERAI. - FUT-THERMO - initial and boundary conditions are built by adding the climate change signal from the CMIP5 climate multi-model ensemble to ERAI, but only to thermodynamical variables (temperature and humidity) + HIST - (ERAI) reanalysis is used as lateral and surface (SST) boundary conditions for present climate simulations + HIST-BRAN - (ERAI) reanalysis is used as lateral boundary conditions and BRAN for surface (SST), + for present climate simulations + FUT - initial and boundary conditions are built by adding the climate change signal from the + CMIP5 RCP8.5 scenario climate multi-model ensemble to ERAI. + FUT-THERMO - initial and boundary conditions are built by adding the climate change signal from + the CMIP5 climate multi-model ensemble to ERAI, but only to thermodynamical variables (temperature and humidity) Physics: -There is one control run (CTL) and four perturbed physics members that include modifications in the cumulus (CU), the surface and planetary boundary layer (PBL), the radiation (RAD) and the microphysics (MPS) schemes. Members are denoted according to the physical scheme that is being changed compared to the CTL run. - +There is one control run (CTL) and four perturbed physics members that include modifications in the +cumulus (CU), the surface and planetary boundary layer (PBL), the radiation (RAD) and the +microphysics (MPS) schemes. Members are denoted according to the physical scheme that is being changed + compared to the CTL run. +``` + +* Extra information if necessary as more details on technical aspects, sources etc. +```{code} Table summary for physics Name | CTL | CU | PBL | RAD | MPS -------------------------------------------------------------------------- @@ -53,10 +78,15 @@ microphysics | WSM6 (6) | WSM6 (6) | WSM6 (6) | WSM6 (6) | Thomp. (8) Numbers after each of the schemes denote the code used in the WRF namelist. MYJ [9]; YSU [8]; KF [12, 11]; BMJ [1, 9, 10]; WSM6 [14]; Dudhia [6], RRTM [15]; CAM [4]. - +``` + +* Variables list + +```{code} +The final data product is the post-processing files of all simulations made following CORDEX standards. +Names and units of the variables that are available after post-processing. +The frequency and the level is also included. Pressure levels are 200, 500, 700, 850 and 925 hPa. - The final data product is the post-processing files of all simulations made following CORDEX standards. -Names and units of the variables that are available after post-processing. The frequency and the level is also included. Pressure levels are 200, 500, 700, 850 and 925 hPa. Surface level variables at 1 hr frequency: prm - precipitation rate (mm) 2m surface variables at 1 hr frequency: @@ -79,45 +109,77 @@ Pressure level variables at 6 hrs frequency: wa - vertical wind (m s-1) zg - gepotential height (m) hus - specic humidity (kg kg-1) - +``` + +* Dataset creator, grants if relevant, research program or project this dataset is an output of -This dataset was created by Dr Alejandro Di Luca as part of the Australian Research Council (ARC) Discovery Early Career Researcher Award (DECRA) funded project (DE170101191) and is part of the Centre of Excellence for Climate Extremes (CLEX) Extreme Rainfall research program. - +```{code} +This dataset was created by Dr Alejandro Di Luca as part of the Australian Research Council (ARC) +Discovery Early Career Researcher Award (DECRA) funded project (DE170101191) and is part of the +Centre of Excellence for Climate Extremes (CLEX) Extreme Rainfall research program. +``` +* Where to find the data online and how the data is organised on the server + +```{code} The data is available online on the NCI TDS server: + https://dapds00.nci.org.au/thredds/catalogs/ks32/CLEX/HiRes-MESECA/HiRes-MESECA.html -and for NCI users in the ks32 project. + + and for NCI users in the ks32 project. + File organisation: - /g/data/ks32/CLEX_Data/HiRes-MESECA/v1-0/////// + + /g/data/ks32/CLEX_Data/HiRes-MESECA/v1-0/////// + where - is WRF24, WRF8 and WRF2 for horizontal grid resolution of 24 km, 8 km and 2 km, respectively - is the starting date of each event - are FUT, FUT-THERMO, HIST and HIST-BRAN - are CONTROL, CU, MPS, PBL and RAD - are 6hr, 3hr, 1hr + + is WRF24, WRF8 and WRF2 for horizontal grid resolution of 24 km, 8 km and 2 km, respectively + is the starting date of each event + are FUT, FUT-THERMO, HIST and HIST-BRAN + are CONTROL, CU, MPS, PBL and RAD + are 6hr, 3hr, 1hr -filenames: +Filenames: _HiRes-MESECA_UNSW-WRF360-i1______v1-0.nc + Example: va200_HiRes-MESECA_UNSW-WRF360-i1_WRF2_2001-07-23_FUT_CTL_6hr_v1-0.nc - < if there is more than one author and/or collaborators list the names and their roles in regard to the data> +``` + +* If there is more than one author and/or collaborators list the names and their roles in regard to the data +```{code} Author: - · Alejandro Di Luca: designed and performed calculations to generate historical and pseudo-global warming boundary conditions; conceived, designed and performed all simulations; coded and run postprocessing scripts. + Alejandro Di Luca: designed and performed calculations to generate historical and pseudo-global + warming boundary conditions; conceived, designed and performed all simulations; coded and run + postprocessing scripts. Contributors: -· Daniel Argueso (d.argueso@uib.es): designed and performed calculations to generate historical and pseudo-global warming boundary conditions; coded postprocessing scripts. -· Nicolas Jourdain (nicolas.jourdain@univ-grenoble-alpes.fr): designed and performed calculations to generate pseudo-global warming boundary conditions. -· Jason Evans (jason.evans@unsw.edu.au): installed and setup the model in the NCI supercomputer, helped on the design and performance of simulations. -
+ Daniel Argueso - d.arguesouib.es: designed and performed calculations to generate + historical and pseudo-global warming boundary conditions; coded postprocessing scripts. + Nicolas Jourdain - nicolas.jourdainuniv-grenoble-alpes.fr: designed and performed + calculations to generate pseudo-global warming boundary conditions. + Jason Evans - jason.evansunsw.edu.au: installed and setup the model in the NCI supercomputer, + helped on the design and performance of simulations. +``` -Contact: di_luca.alejandro@uqam.ca for any question on the dataset content and provenance - paola.petrelli@utas.edu.au for questions or issues with file accessibility - +* Main contact, if relevant specify a contact for scientific questions and one for file access +```{code} +Contact: di_luca.alejandrouqam.ca for any question on the dataset content and provenance + paola.petrelliutas.edu.au for questions or issues with file accessibility +``` + +* Dataset citation, this should always include the dataset doi + +```{code} Citation: - Di Luca, Alejandro, Argueso, D., Jourdain, N., Evans, J., 2021. High-Resolution Modelling of Extreme Storms over the East Coast of Australia v1.0. NCI National Research Data Collection, doi:10.25914/604eb7628b4e7 -Other information you can include which was not present in this particular example are + Di Luca, Alejandro, Argueso, D., Jourdain, N., Evans, J., 2021. High-Resolution Modelling of + Extreme Storms over the East Coast of Australia v1.0. NCI National Research Data Collection, + doi:10.25914/604eb7628b4e7 +``` -References list, if you are citing other sources in the description +Other information you can include which was not present in this particular example are -Associated papers for publications that are based on the dataset +* References list, if you are citing other sources in the description +* Associated papers for publications that are based on the dataset