Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

request_era5() issue when combining .nc files #42

Open
kris-wild opened this issue Nov 18, 2024 · 2 comments
Open

request_era5() issue when combining .nc files #42

kris-wild opened this issue Nov 18, 2024 · 2 comments

Comments

@kris-wild
Copy link

Hi,

My name is Kristoffer Wild. I'm trying to use 'request_era5' function and I keep running into an issue when trying to combine the .nc files using the 'combine_netcdf' argument (see below). I've dug into the 'request_era5' function from here: https://rdrr.io/github/dklinges9/mcera5/src/R/request_era5.R and still can't figure out why I'm unable to combine the .nc files for a given year.

Since the last package update, my .nc files are coming in in groups of 12 (1950_1950_1.nc, 1950_1950_2.nc,...;presumably by month?) when requesting data for a given area. In the past I get only one file per year. Anyways, below is the code I'm working with. All you have to do is adjust your info for: working directory, uid, cds_api_key. I'm sorry if there is a total oversight on my end and thank you again for your help. Below is the code followed by the error:

#####-----Download climate data from ERA5-----#####
library(mcera5)
library(dplyr)
library(ecmwfr)
library(lubridate)
library(tidync)
library(microclima)

build_era5land_request <- function (xmin, xmax,
ymin, ymax,
start_time, end_time,
outfile_name = "era5_out")
{
if (missing(xmin)) {
stop("xmin is missing")
}
if (missing(xmax)) {
stop("xmax is missing")
}
if (missing(ymin)) {
stop("ymin is missing")
}
if (missing(ymax)) {
stop("ymax is missing")
}
if (missing(start_time)) {
stop("start_time is missing")
}
if (missing(end_time)) {
stop("end_time is missing")
}
xmin_r <- plyr::round_any(xmin, 0.25, f = floor)
xmax_r <- plyr::round_any(xmax, 0.25, f = ceiling)
ymin_r <- plyr::round_any(ymin, 0.25, f = floor)
ymax_r <- plyr::round_any(ymax, 0.25, f = ceiling)
ar <- paste0(ymax_r, "/", xmin_r, "/", ymin_r,
"/", xmax_r)
ut <- uni_dates(start_time, end_time)
request <- list()
for (i in 1:length(unique(ut$yea))) {
yr <- unique(ut$yea)[i]
sub_mon <- ut %>% dplyr::filter(., yea == yr) %>% dplyr::select(.,
mon)
sub_request <- list(dataset_short_name = "reanalysis-era5-land",
product_type = "reanalysis", variable = c("2m_temperature",
"2m_dewpoint_temperature", "surface_pressure",
"10m_u_component_of_wind", "10m_v_component_of_wind",
"total_precipitation", "total_cloud_cover",
"mean_surface_net_long_wave_radiation_flux",
"mean_surface_downward_long_wave_radiation_flux",
"total_sky_direct_solar_radiation_at_surface",
"surface_solar_radiation_downwards", "land_sea_mask"),
year = as.character(yr), month = as.character(sub_mon$mon),
day = c("01", "02", "03", "04",
"05", "06", "07", "08",
"09", "10", "11", "12",
"13", "14", "15", "16",
"17", "18", "19", "20",
"21", "22", "23", "24",
"25", "26", "27", "28",
"29", "30", "31"), time = c("00:00",
"01:00", "02:00", "03:00",
"04:00", "05:00", "06:00",
"07:00", "08:00", "09:00",
"10:00", "11:00", "12:00",
"13:00", "14:00", "15:00",
"16:00", "17:00", "18:00",
"19:00", "20:00", "21:00",
"22:00", "23:00"), area = ar, format = "netcdf",
target = paste0(outfile_name, "_", yr, ".nc"))
request[[i]] <- sub_request
}
return(request)
}

request_era52 <- function (request, uid,
out_path,
overwrite = FALSE,
combine = TRUE,
timeout = 18000)
{
if (length(request) == 1 & combine) {
cat("Your request will all be queried at once and does not need to be combined.\n")
}
for (req in 1:length(request)) {
if (file.exists(paste0(out_path, "/", request[[req]]$target)) &
!overwrite) {
if (length(request) > 1) {
stop("Filename already exists within requested out_path in request ",
req, " of request series. Use overwrite = TRUE if you wish to overwrite this file.")
}
else {
stop("Filename already exists within requested out_path. Use overwrite = TRUE if you wish to overwrite this file.")
}
}
ecmwfr::wf_request(user = as.character(uid), request = request[[req]],
transfer = TRUE, path = out_path, verbose = TRUE,
time_out = timeout)
if (file.exists(paste0(out_path, "/", request[[req]]$target))) {
if (length(request) > 1) {
cat("ERA5 netCDF file", req, "successfully downloaded.\n")
}
else {
cat("ERA5 netCDF file successfully downloaded.\n")
}
}
}
if (length(request) > 1 & combine) {
cat("Now combining netCDF files...\n")
fnames <- lapply(request, function(x) {
x$target
})
combine_netcdf(filenames = fnames, combined_name = "combined.nc")
cat("Finished.\n")
}
}

####Download ERA5 data####
##setwd and assign your credentials
setwd("/Volumes/The Brain/ERA5_Australia/")
uid<-"XXXXXXX"
cds_api_key<-"XXXXXXX"
ecmwfr::wf_set_key(user = uid,
key = cds_api_key)

######## Loop for data by year
for(year in 1966:1967){
##Building a request
#bounding coordinates
xmn<- 112
xmx<- 154
ymn<- -45
ymx<- -10

#temporal extent
st_time<-lubridate::ymd(paste0(year, ":01:01"))
en_time<-lubridate::ymd(paste0(year, ":12:31"))

filename and location for downloaded .nc files

#file_prefix<-paste(i,"_",i+5,sep="")

op <-paste0('/Volumes/The Brain/ERA5_Australia/')
if(exists(op)!=TRUE){
dir.create(op)
}

build a request (covering multiple years)

req<-build_era5_land_request(xmin = xmn, xmax = xmx,
ymin = ymn, ymax = ymx,
start_time = st_time,
end_time = en_time,
outfile_name = year)

Obtaining data with a request

request_era52(request = req, uid = uid, out_path = op, timeout = 18000 * 2)
#}
}

ERA5 netCDF file 12 successfully downloaded.
Now combining netCDF files...
Error in combine_netcdf(filenames = fnames, combined_name = "combined.nc") :
could not find function "combine_netcdf"

@dklinges9
Copy link
Owner

Hi Kris,

(we can continue our email conversation, but replying here so my response can be seen by others publicly)

Thanks for bringing this up. First, the error you're receiving here (could not find function "combine_netcdf") is because comine_netcdf() is an internal function to mcera5, so it's not exported when the package is built and therefore not available when you load the package (e.g. via library(mcera5)). For debugging purposes, you'd need to clone the mcera5 repository and explicitly source() call the script R/internal.R to have the function combine_netcdf() available. That said, such debugging is my job, not yours! And I might as well just make combine_netcdf() available externally.

Beyond this, I did indeed find a bug in combine_netcdf(), which I just fixed with this recent commit. There might be other issues at play here; it's hard for me to test with your files (or with any files) at the moment as presently the CDS server is down. So I'll keep this issue open and continue exploring.

As some heads up: currently, build_era5_request() queries files by month to help keep each query below new CDS limits, which have been lowered. I have functionality ready locally to allow the user to choose between monthly or annual queries (which might have trade-offs on download speed). I won't push this yet, until I've tested it a bit more (once the CDS server is working again)

@dklinges9
Copy link
Owner

combine_netcdf() is now also exported with this commt, to serve as a stand-alone function (and this would make such debugging useful in the future)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants