Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW]: epwshiftr: Create future EnergyPlus Weather files using CMIP6 data #4030

Closed
27 of 40 tasks
whedon opened this issue Dec 31, 2021 · 40 comments
Closed
27 of 40 tasks

Comments

@whedon
Copy link

whedon commented Dec 31, 2021

Submitting author: @hongyuanjia (Hongyuan Jia)
Repository: https://github.com/ideas-lab-nus/epwshiftr
Branch with paper.md (empty if default branch):
Version: v0.1.3
Editor: @KristinaRiemer
Reviewers: @mitmat, @bczernecki
Archive: Pending

⚠️ JOSS reduced service mode ⚠️

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/794e8707a0bd5b38777d514650a0fd65"><img src="https://joss.theoj.org/papers/794e8707a0bd5b38777d514650a0fd65/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/794e8707a0bd5b38777d514650a0fd65/status.svg)](https://joss.theoj.org/papers/794e8707a0bd5b38777d514650a0fd65)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@mitmat & @bczernecki, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

  1. Make sure you're logged in to your GitHub account
  2. Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @KristinaRiemer know.

Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest

Review checklist for @mitmat

✨ Important: Please do not use the Convert to issue functionality when working through this checklist, instead, please open any new issues associated with your review in the software repository associated with the submission. ✨

Conflict of interest

  • I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

General checks

  • Repository: Is the source code for this software available at the repository url?
  • License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
  • Contribution and authorship: Has the submitting author (@hongyuanjia) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
  • Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines

Functionality

  • Installation: Does installation proceed as outlined in the documentation?
  • Functionality: Have the functional claims of the software been confirmed?
  • Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
  • Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
  • Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
  • Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

  • Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
  • A statement of need: Does the paper have a section titled 'Statement of Need' that clearly states what problems the software is designed to solve and who the target audience is?
  • State of the field: Do the authors describe how this software compares to other commonly-used packages?
  • Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
  • References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

Review checklist for @bczernecki

✨ Important: Please do not use the Convert to issue functionality when working through this checklist, instead, please open any new issues associated with your review in the software repository associated with the submission. ✨

Conflict of interest

  • I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

General checks

  • Repository: Is the source code for this software available at the repository url?
  • License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
  • Contribution and authorship: Has the submitting author (@hongyuanjia) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
  • Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines

Functionality

  • Installation: Does installation proceed as outlined in the documentation?
  • Functionality: Have the functional claims of the software been confirmed?
  • Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
  • Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
  • Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
  • Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

  • Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
  • A statement of need: Does the paper have a section titled 'Statement of Need' that clearly states what problems the software is designed to solve and who the target audience is?
  • State of the field: Do the authors describe how this software compares to other commonly-used packages?
  • Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
  • References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?
@whedon
Copy link
Author

whedon commented Dec 31, 2021

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @mitmat, @bczernecki it looks like you're currently assigned to review this paper 🎉.

⚠️ JOSS reduced service mode ⚠️

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

⭐ Important ⭐

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

  1. Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

  1. You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf

@whedon
Copy link
Author

whedon commented Dec 31, 2021

Failed to discover a Statement of need section in paper

@whedon
Copy link
Author

whedon commented Dec 31, 2021

Wordcount for paper.md is 709

@whedon
Copy link
Author

whedon commented Dec 31, 2021

Software report (experimental):

github.com/AlDanial/cloc v 1.88  T=0.06 s (581.2 files/s, 114505.9 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
R                               13            474           1065           2455
SVG                              1              1              1            743
Markdown                        10            149              0            668
YAML                             6             44              2            285
TeX                              1             12              0            168
Rmd                              1             69            107             61
-------------------------------------------------------------------------------
SUM:                            32            749           1175           4380
-------------------------------------------------------------------------------


Statistical information for the repository 'fc46dcc1b5de7283eebb7f8b' was
gathered on 2021/12/31.
No commited files with the specified extensions were found.

@whedon
Copy link
Author

whedon commented Dec 31, 2021

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.3390/buildings9070166 is OK
- 10/b4n9qc is OK
- 10.5194/gmd-9-1937-2016 is OK
- 10.5194/gmd-9-3461-2016 is OK
- 10.1016/s0378-7788(00)00114-6 is OK
- 10/b4z2gp is OK
- 10.18637/jss.v059.i10 is OK
- 10.1016/j.enbuild.2021.110757 is OK

MISSING DOIs

- 10.1016/0927-0248(95)80019-0 may be a valid DOI for title: 38th European Photovoltaic Solar Energy Conference and Exhibition

INVALID DOIs

- None

@whedon
Copy link
Author

whedon commented Dec 31, 2021

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

@KristinaRiemer
Copy link

Hi @hongyuanjia, you might want to review the "What should my paper contain?" section of the JOSS author guide to ensure your paper has the required sections in it.

@hongyuanjia
Copy link

Hi @KristinaRiemer, thanks for the remainder. I will update the paper structure accordingly based on the guide.

@mitmat
Copy link

mitmat commented Jan 12, 2022

Hi @hongyuanjia,

I started to review the package. I got to test the functionality until
https://github.com/ideas-lab-nus/epwshiftr#extract-cmip6-output-data
but since I do not have EnergyPlus, I cannot continue. Is it possible to put some example epw file somewhere, so I can continue? Maybe even the one from the example, so I can check it out...

@hongyuanjia
Copy link

Hi @mitmat, thanks for the review. EnergyPlus is an open-source building energy simulation. It provides free available worldwide weather data at https://energyplus.net/weather. The one I used in the example can be downloaded at the EnergyPlus GitHub repo: https://raw.githubusercontent.com/NREL/EnergyPlus/develop/weather/USA_CA_San.Francisco.Intl.AP.724940_TMY3.epw.

Please let me know if you encounter any issue about accessing the data.

@mitmat
Copy link

mitmat commented Jan 14, 2022

I got through the examples in the repo readme. Package works like a charm. Already the first part of managing the many files is solved nicely. I have no real issues with the functionality, but the documentation needs to be improved! (see also the issues in the repo).

Comments to the paper will come later…

License:

You have two license files in the repo, the one is MIT (as specified on CRAN, too), and the other is a personal copyright (non-permissive, I guess?). For JOSS, an open license is required, so please check this.

Contribution and authorship:

From the github commit history, it’s clear that the main author has made major contributions to the package. The co-author’s contribution is not clear: his extremely few commits are not of substantial nature, so please clarify the author list and contributions.

Functionality

While using the package, one likely will have to work with many files. I like the approach you’ve taken to manage a file index. However, I suggest to explain better how you set the custom directory, and maybe give some hints on why one should do this.

In init_cmip6_index(), if you select year 2050, why do you also get the previous and next year?

What is future_epw() creating? The files seem to have the same years, but the values are changed.

How do you deal with non-standard calendars in GCMs? Noleap or 360?

Documentation

The esgf_query() and init_cmip6_index() functions have in theory a lot of options to query the ESGF servers. However, your package is designed for CMIP6, so maybe better to not allow all values to changed?

BTW, out of curiosity, are you aware of any packages in R that query ESGF servers? And if yes, do they have the option of login? I know of something in python, but have not seen anything for R yet.

@whedon
Copy link
Author

whedon commented Jan 14, 2022

👋 @mitmat, please update us on how your review is going (this is an automated reminder).

@whedon
Copy link
Author

whedon commented Jan 14, 2022

👋 @bczernecki, please update us on how your review is going (this is an automated reminder).

@mitmat
Copy link

mitmat commented Jan 18, 2022

@hongyuanjia @KristinaRiemer

I finished the review. Please find in a post above the comments to the package, and some new issues in the pkg repo.

As addon to above:

The function summary_database() provides in its summary start and end dates, but no info, whether you have complete coverage of all the years in between. Is this intended so?

For the morphing methods, or generally when working with climate data, one needs to use climatological averages (i.e. 30 year averages). Maybe you could give the users a warning if they fail to do so, when calling the morphing function?

Finally, regarding the paper:

The package is nice in a way that it provides the full process from managing climate model data, performing bias adjustment, and giving output in the desired format.

Section on “Statement of need” is missing

I wonder how it compares to existing packages, which of course do not have all the functionality, but maybe parts? Is there anything in R for manging CMIP6 (or CMIP5, for that matter), or other ESGF data? Bias adjustment and downscaling is an active topic, I’m sure some packages exist? (-> missing “state of the field”)

Some disclaimer on using climate data would be nice (maybe also in the github repo). It’s great that the climate data is freely available, but still users need to be made aware of what climate model data can or cannot do. See e.g. the well-made guide from EURO-CORDEX https://www.euro-cordex.net/imperia/md/content/csc/cordex/guidance_for_euro-cordex_climate_projections_data_use__2021-02_1_.pdf

The term morphing is typical for building simulations, but rarely used outside of the field. You might want to mention the term “bias adjustment” or “downscaling”, which are commonly used. https://hypeweb.smhi.se/what-is-bias-adjustment/

Minor comments:

L13: GCM usually refers to General Circulation Model, not Global Climate model
L18: “will be used” should be in the past, since CMIP6 has already been used in the IPCC AR6.
L20: which existing tools?
L21-23: I do not agree. There are a lot of tools in the wild. Maybe not specifically targeted to BES, but for “general” climate research, there are many existing tools, one is e.g. https://github.com/SantanderMetGroup/climate4R and others.

And the last thing, out of personal curiosity, not necessarily relevant for the review:
Is there any chance to extract information directly from nc files, without needing to download everything? Since you only extract point information, this could save a lot of space and bandwidth. Do you if that is possible?

@hongyuanjia
Copy link

Thank you @mitmat for taking the time to give such thorough reviews of epwshiftr! I will start to address all your comments and issues opened in the epwshiftr repo in the following days.

I got through the examples in the repo readme. Package works like a charm. Already the first part of managing the many files is solved nicely. I have no real issues with the functionality, but the documentation needs to be improved! (see also the issues in the repo).
I agree that the documentation needs improvements, especially for the description of the algorithm used to statistical downscaling climate data.

License:

You have two license files in the repo, the one is MIT (as specified on CRAN, too), and the other is a personal copyright (non-permissive, I guess?). For JOSS, an open license is required, so please check this.

epwshiftr is released under MIT license. Unlike GPL license, MIT license is a template that requires additional details to be complete in the LICENSE file. CRAN requires packages to specify the License to be MIT + file LICENSE where the LICENSE contains only two fields: the year and copyright holder, which will be filled into the MIT template, see https://cran.r-project.org/web/licenses/MIT. CRAN does not allow packages to include the full text of standard licenses like MIT.

As described in https://r-pkgs.org/license.html#key-files, it is common practice to include a copy of the license file in open-source software. Since CRAN does not permit to include a copy of standard licenses in packages, the full copy of the license is included in .Rbuildignore to make sure this file is not sent to CRAN. So that's why there are 2 "license" files in the repo:

  • LICENSE: The one required by CRAN which includes the year and copyright holder data for the MIT license.
  • LICENSE.md: The full MIT license.

Contribution and authorship:

From the github commit history, it’s clear that the main author has made major contributions to the package. The co-author’s contribution is not clear: his extremely few commits are not of substantial nature, so please clarify the author list and contributions.

The epwshiftr was developed during my postdoc and was supported by a research project of Adrian, the co-author. Even Adrian did not commit to the repo that much, but he contributed a lot to the development underneath via conceptualization and supervision through lots of regular discussions on project direction and design.

Functionality

While using the package, one likely will have to work with many files. I like the approach you’ve taken to manage a file index. However, I suggest to explain better how you set the custom directory, and maybe giving some hints on why one should do this.

Thanks for the suggestion. The user can use the option epwshiftr.dir to control where the file index is saved. The default data storage is the OS-specific user data directory using the rappdirs::user_data_dir() in order to comply with CRAN policies. There is also a function get_data_dir() to retrieve the current data directory. I agree that some hints may be needed to explain when to change the data directory, especially the case when one may want to keep file index and climate data in one place. I will add some documentation for this.

In init_cmip6_index(), if you select year 2050, why do you also get the previous and next year?

This relates to the way that epwshiftr handles datetime data in NetCDF files, as you also mentioned in the following question:

How do you deal with non-standard calendars in GCMs? Noleap or 360?

Most of output NetCDFs from CMIP6 GCMs use Proleptic Gregorian calendar. epwshiftr uses RNetCDF::utcal.nc() to correctly handle the conversion. However, it does not handle Noleap or 360 calendars. Thanks for catching it. I will make a patch to epwshiftr for non-standard calendar handling.

Why I need the previous and next year is that, taking tas_day_EC-Earth3_ssp585_r1i1p1f1_gr_20590101-20591231.nc as an example, which uses Proleptic Gregorian calendar with unit days since 1850-01-01 00:00:00. The range of the date time entry is [76336.5, 76700.5], which represents 2059-01-01 12:00:00 UTC and 2059-12-31 12:00:00 UTC. In order to get a full year of 2059, we need the previous and next half day.

What is future_epw() creating? The files seem to have the same years, but the values are changed.

future_epw() is used to format the morphed climate data into the EnergyPlus Weather file (EPW) which can be directly used by large number of building performance simulation software. The mostly widely used data format for EPW is probably TMY3 (Typical Meteorological Year). To calculate a TMY, a multiyear data set is analyzed and 12 months are chosen from that time frame that best represents the median conditions. For example, a TMY developed from a set of data for the years 1998–2005 might use data from 2000 for January, 2003 for February, 1999 for March, and so on. So when inserting morphed climate data back into the EPW file, I kept the year value unchanged to keep the typical year information. A comment is inserted in the COMMENTS 1 field giving some details about the data, e.g. This climate change adapted weather file, which bases on 'ssp126-2080s-INM-CM5-0' ensemble data....

@KristinaRiemer
Copy link

It looks like @hongyuanjia has started to respond to @mitmat's comments, thank you for your thoroughness with those! And I agree with the conclusion about authorship; intellectual contributions are more than sufficient for authorship, see the JOSS authorship guidelines.

@bczernecki it looks like you've started your review, how is it going?

@bczernecki
Copy link

@hongyuanjia @KristinaRiemer

I have quite thoroughly investigated content of the epwshiftr R package and it looks quite promising. However, IMHO there's still a big space for further improvements which needs to be addressed before accepting the software for publication in JOSS and recommending the software for wider audience.

Most of my concerns and comments stay in line with everything that was previously suggested by @mitmat, i.e.

  • LICENSE
  • contribution of co-author. I have checked that in Git repo there's around 10 lines of code that were touched by the co-author. To be honest - it looks like ghost-writing and I would strongly recommend to clearly address contribution of both authors and remove co-authors with contribution that is less than ~5%
  • Documentation (more details later on)
  • Unit / Integration tests - Currently testthat returns around 60% of code coverage. This is definitely not enough for good quality software that can be used in scientific / commercial / engineering purposes. In most companies the lower limit is set to 80-85%

Other comments are related to the problems that I found while working with the examplary workflow suggested by README file:

  • in the "create a CMIP6 output file index" idx <- init_cmip6_index source argument is not clear by reading built-in documentation - I think there should be some extra step that would give a hint which GCMs are available as these mentioned in the code might become outdated quite soon. It is also not clear whether only the mentioned GCMs are supported or user can provide its own (playing with the code show that the latter option is applicable in here), and therefore I would highly recommend to add an extra step that will tell an user which GCMs can be used here.

  • According to the README file, section: Manage CMIP6 output files: "You have to download CMIP6 output file by yourself using your preferable methods or tools. The download url can be found in the file_url column in the index." Frankly speaking I don't understand why this capability is not supported by the package? I tried to do it with the build-in download.file command and it worked for me:

> for (i in seq_along(idx$file_url)){
+   download.file(idx$file_url[i], basename(idx$file_url[i]))
+ }
attempting URL address 'http://esgf3.dkrz.de/thredds/fileServer/cmip6/ScenarioMIP/DKRZ/MPI-ESM1-2-HR/ssp585/r1i1p1f1/day/hurs/gn/v20190710/hurs_day_MPI-ESM1-2-HR_ssp585_r1i1p1f1_gn_20450101-20491231.nc'
Content type 'application/x-netcdf' length 323106447 bytes (308.1 MB)
==================================================
downloaded 308.1 MB

I think developer should write a solution that basically checks whether downloading in R is possible and if not then provide message or even recommend external packages for this purposes that can be implemented with epwshiftr. Moreover, it is not clear to me where this NetCDF files should be downloaded (into working directory or tmp folder that was created before?). I could find it in the next step, but would be good to know it in advance

  • Line: epw <- file.path(eplusr::eplus_config(8.8)$dir, "WeatherData/USA_CA_San.Francisco.Intl.AP.724940_TMY3.epw") gives warning: Failed to find configuration data of EnergyPlus v8.8
    What I had to do on MacOS: eplusr::install_eplus("latest") which basically resolved the problem. I think it should be clearly defined in the README file about this mandatory step. Some extra code that would detect whether it is needed or not would be also beneficial. In my case I ended up with:
    epw <- file.path(eplusr::eplus_config(9.4)$dir, "WeatherData/USA_CA_San.Francisco.Intl.AP.724940_TMY3.epw"), so it looks like defualt versioning might be different for different operating systems.

  • Line: coord <- match_coord(epw, threshold = list(lon = 1, lat = 1), max_num = 1) - as far as I understood the code, it assumes that distance on longitude and latitude is the same which is not true (except equator). It has to be corrected e.g. by utilizing euclidean or spheroid formulas (e.g. https://github.com/bczernecki/climate/blob/dev/R/spheroid_dist.R)

  • there's sth wrong in parsing coordinates !!! Please take a look that knitr::kable(head(data$data)) - it gives longitude 301.875, which in reality should be [1] -122.4; I am wondering whether the data taken for calculation in next steps is accurrate

  • It is not clear to me whether any bias correction method is applied and if so which one is utilized in the package. It is utterly important as some data converted in the package do not follow gaussian distribution and it would require some extra domain knowledge. I would add extra message about bias correction or lack of such a method. If there's not bias correction then I would highly recommend utilizing one of available packages.

@KristinaRiemer
Copy link

@hongyuanjia have you had a chance to look over @bczernecki's comments yet?

@bczernecki, with regards to coauthorship, our JOSS authorship guidelines err on the side of letting the authors decide who has made meaningful enough non-code contributions to warrant authorship. It sounds like those contributions were sufficient in this case.

About the test coverage, maybe it would be more useful to determine if the current set of tests cover the most important or error-prone functionality of the software, as opposed to just a threshold for test coverage percent? Are there parts of this code that currently lack tests but could really benefit from them?

@hongyuanjia
Copy link

@KristinaRiemer Sorry for the late. I just came back to work from the Chinese New Year holiday. I will start to address @bczernecki comments in the following week.

License

Thank you @KristinaRiemer for further elaboration on the coauthorship requirement of JOSS.
Please see #4030 (comment). Adrian contributed a lot to the development underneath via conceptualization and supervision through lots of regular discussions on project direction and design. I believe his contribution is sufficient enough to be on the author list.

Unit tests

Actually, the test coverage of epwshiftr is 100%. The drop of coverage to ~63% is due to the undesired test execution order. Please see ideas-lab-nus/epwshiftr#35. This has been fixed in ideas-lab-nus/epwshiftr#36.

Available GCMs (source)

Thanks for the suggestions. init_cmip6_index() is calling esgf_query() underneath, which will send queries using ESGF search RESTful API. The source argument indicates the name of GCM (General Circulation Models) as specified by the ESGF API. Currently there are ~110 different GCMs included in various CMIP6 activity. The full list of GCMs can be found online. I will update the documentation to include the webpage of the GCM full list.

Since the GCM outputs listed all follow the CMIP standards, the parsing functionalities in epwshiftr should work fine on all the GCMs. The criteria for selecting the default 11 GCMs were the resolution scale (daily), projection variables (giving outputs most related to climate data for building energy simulation), and the generation of all 4 experiments of activity ScenarioMIP. This follows the same approach for the commercial service WeatherShift. Actually, the initial motivation of epwshiftr development was creating a free open-source package that can replicate similar outputs as WeatherShift and CCWorldWeatherGen for building energy modelers, but updated for the latest CMIP6 data.

Downloading GCM outputs

Why not provide functionality to download using the package itself?

I truly agree it would be useful if epwshiftr could provide functionality to download netCDF files for GCMs. However, I find that it is hard to maintain a fast, easy-to-use and reliable approach to download them because of the following reasons:

File size
All CMIP6 output has been written to netCDF files with one variable stored per file. The size of a single GCM output highly depends on the date period simulated, grid resolution, and time resolution. The file you mentioned is for humidity data for a single emission scenario for only 5y (2045 ~ 2049). Some GCM, e.g. MRI-ESM2-0, compiles 50y data into one netCDF and the file size explosive grown to ~2.6G. The total file size of humidity data from MRI-ESM2-0 for all 4 emission scenarios for 2020-2100 is >32G. Let alone that it is normal to consider various climate variables from multiple GCMs. The file size can easily go up to hundreds of GBs.
Unstable data node
Moreover, CMIP6 model outputs are hosted on a collection of nodes across the world and some nodes could down which makes certain data inaccessible at some point. epwshiftr provides get_data_node() to help the user retrieve the data node status, including ping responses.

I will update the package document to describe more on why the decision was made to let the user handle the downloading itself.

Where to put the netCDF files

Actually, the location of the netCDF files does not matter except that all files should be placed in a single folder (can be in different subdirectories). epwshiftr lets the user to use the dir argument in summary_database() to specify where it should scan the netCDF files the user downloaded against the index file created using init_cmip6_index(). I will update the example in README to provide more informative guide on this.

EPW file in the example in README

The example in the README focuses on the basic workflow of using epwshiftr and hides a lot of other information including what EnergyPlus is and where to get EnergyPlus weather files (EPW). When I wrote the example, I assumed that the target users of epwshiftr package were building energy modelers who have installed and use EnergyPlus. Sorry for the trouble when trying to access the EPW file included in the example.

Every version of EnergyPlus comes along with several example EPW files. They do not change across different versions.
eplusr::eplus_config(8.8)$dir was used to get the installation folder of EnergyPlus v8.8. The code epw <- file.path(eplusr::eplus_config(8.8)$dir, "WeatherData/USA_CA_San.Francisco.Intl.AP.724940_TMY3.epw") wouldn't work since you did not have EnergyPlus of that version installed. eplusr::install_eplus("latest") will install the latest EnergyPlus version {eplusr} package supported. At that time, the latest supported version was v9.4. So that's why eplusr::eplus_config(9.4)$dir worked. As I mentioned in #4030 (comment), EnergyPlus provides free available worldwide weather data at energyplus.net/weather. The one I used in the example can be downloaded at the EnergyPlus GitHub repo. I will update the example to directly use that link.

The EPW file is a mandatory input for match_coord(). Despite an EPW file path, the epw argument can also be a regular expression, e.g. `"los angeles.*tmy3", to find and download matched EPW files in the EnergyPlus weather database and OneBuilding.org. Using this approach did not require an installation of EnergyPlus at all. I documented this but did not include the usage of it to keep the example short. I may update the example to demonstrate this.

Grid distance calculation

Line: coord <- match_coord(epw, threshold = list(lon = 1, lat = 1), max_num = 1) - as far as I understood the code, it assumes that distance on longitude and latitude is the same which is not true (except equator). It has to be corrected e.g. by utilizing euclidean or spheroid formulas (e.g. https://github.com/bczernecki/climate/blob/dev/R/spheroid_dist.R)

Thanks a lot for catching this and thanks for providing the approach from the {climate} package!
I found the blog Accessing netCDF Data by Coordinates from Unidata to be a very helpful resource on describing common approaches. I will update the package to correct this.

Coordinate parsing

Thanks for mentioning this. Actually, I believe epwshiftr has already correctly handled this case, please see https://github.com/ideas-lab-nus/epwshiftr/blob/master/R/coord.R#L205-L208 during coordinate matching in match_coord(). Currently coordinate data was only used when finding "closest" grid points and extracting data of them.

@hongyuanjia
Copy link

hongyuanjia commented Feb 17, 2022

@KristinaRiemer @mitmat @bczernecki I have opened issues in the epwshiftr repo to be addressed as per your comments. I will address them one by one. Once all are solved, I will request to process the review.

@KristinaRiemer
Copy link

Hi @hongyuanjia! How is your progress on the list of issues going? Do you need anything to help facilitate right now?

@KristinaRiemer
Copy link

Hello @hongyuanjia, how are the improvements on your submission going?

@hongyuanjia
Copy link

@KristinaRiemer Sorry for the late. Unfortunately, I am slowly moving forward. Right now I mainly focus on addressing ideas-lab-nus/epwshiftr#40. Will address reviewer comments related to features (ideas-lab-nus/epwshiftr#32, ideas-lab-nus/epwshiftr#19, ideas-lab-nus/epwshiftr#43) and bugs (ideas-lab-nus/epwshiftr#39) during this weekend. I will try my best to solve documentation-related issues next week.

@KristinaRiemer
Copy link

@hongyuanjia it's all good! I just wanted to make sure things are moving along and nothing is blocking your work.

@KristinaRiemer
Copy link

Hi @hongyuanjia, how are your updates to this software progressing? It looks like you have closed some of the issues.

@hongyuanjia
Copy link

Hi @KristinaRiemer. Most of the feature-related issues have been solved, except for ideas-lab-nus/epwshiftr#19, which may need more refactoring. Also, to solve ideas-lab-nus/epwshiftr#38 in a more comprehensive way, I have decided to add a new feature to download and parse all available CMIP6 controlled variables and GCM outputs (ideas-lab-nus/epwshiftr#53). This part has almost been completed. I should be able to complete them at the end of this month. Then other issues related to documentation and the paper structure next month. Apologies for the slow moving. I hope I could solve all issues by the mid of next month.

@KristinaRiemer
Copy link

Thanks for the update @hongyuanjia! It's all good on the pacing, I just want to make sure you're not blocked by anything and have a plan for moving forward, which it sounds like you do!

@KristinaRiemer
Copy link

Hi @hongyuanjia, just checking in again to see how the updates are going?

@KristinaRiemer
Copy link

HI @hongyuanjia, just want to let you know that I'm going to be on vacation for the next three weeks. Hopefully you'll have a chance to wrap up the rest of your improvements by then! Thanks.

@KristinaRiemer
Copy link

/ooo July 15 until August 5

@hongyuanjia
Copy link

Hi @KristinaRiemer, I think most of the feature implementations have completed. I should be able to complete the documentation during this time period. Thanks!

@KristinaRiemer
Copy link

Hi @hongyuanjia, it looks like you completed a big chunk of work in your most recent commit in the software repo. Is the checklist of issues in a previous comment up to date? Have you been able to work on the documentation tasks?

@editorialbot editorialbot added the Track: 3 (PE) Physics and Engineering label Sep 10, 2022
@arfon
Copy link
Member

arfon commented Oct 15, 2022

👋 folks. Just checking in here – are we still waiting for @hongyuanjia to make their updates?

@KristinaRiemer
Copy link

Thanks for checking in @arfon. @hongyuanjia, it looks like there are still some open issues based on the reviewers' feedback, how are those going? Is there anything blocking your progress?

@KristinaRiemer
Copy link

Hi @hongyuanjia, checking in to see if you have a timeline for completing these updates?

@KristinaRiemer
Copy link

Hi @hongyuanjia, I'm checking in again to see if you know when you will be able to address the updates? It appears that the last activity in the software repo was during October 2022.

@arfon
Copy link
Member

arfon commented Jul 28, 2023

@hongyuanjia – it's more than a year since we heard back from you on this review. We will proceed to reject this submission as abandoned in 2 weeks from today if we don't hear back from you.

@KristinaRiemer
Copy link

I have sent an email to @hongyuanjia about this submission.

@arfon
Copy link
Member

arfon commented Aug 12, 2023

@editorialbot reject

This submission is being rejected due to inactivity on the part of the author. @KristinaRiemer @mitmat @bczernecki – thanks for all of your efforts here, sorry this one didn't work out!

@editorialbot
Copy link
Collaborator

Paper rejected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants