Introductory Work - up to parallel processing of latitudes #5

lthUniBonn · 2021-01-10T22:55:09Z

First attempt at a pull request - this should include my working solutions to #3 and #4.

#3 is replaced mainly in the plot_maps (xarray requires slightly different array handling)

in the processing only the output formats are changed to netCDF4
adaptive plotting is not included yet, so the axis scaling is still to be adapted manually, some of my first changes on the scaling are sadly also included here, but it still works just as before so I hope it's fine until I get to change the plotting fully

#4 the parallel_processing.. uses command line arguments to chose the latitude subsets, output files are saved in /lats subfolder

in principle also the previous handling of all latitudes one after the other is still included

From the beginning of my introduction a slight change in the readme is included, as one sentence gave some outdated info on the downloading process I think.

Also the configuration changes for the now finishing 'Christmas Download' is included, which I also think should be good, as it is supposed to be the working configuration from now on. (e.g. grid size, number of downloaded levels, ..)

I think I'll try to separate the work in more branches in the future. Any feedback is welcome!
Cheers,
Lavinia

…ng speed n_lats_at_once

… contains start and endyear

…ng speed n_lats_at_once

…t for 2019

… to different heghts - needs fixing, autmatisation maybe

…ded in data with variable height calculation, will be downloaded

…uding the changes in data-array handling, plotting still not adaptive

… processing via program arguments, latitude files saved individually in /lats directory

…wever full functionality of this code now also included in parallel_process_data.py

markschelbergen

I don't understand the flow for parallel processing. I provided comments on the pieces of codes that I have doubts about. However, let's discuss in a Skype call such that you can explain me your rational better.

markschelbergen · 2021-01-13T13:36:03Z

config.py

+era5_data_dir = '/cephfs/user/s6lathim/ERA5Data-112/'
+model_level_file_name_format = "{:d}_europe_{:d}_130_131_132_133_135.nc"  # 'ml_{:d}_{:02d}.netcdf'
+surface_file_name_format = "{:d}_europe_{:d}_152.nc" # 'sfc_{:d}_{:02d}.netcdf' 


put back the old data dir and file names

taken care of - commit e3b9ba8

markschelbergen · 2021-01-13T13:36:19Z

config.py

put back the old data dir and file names

taken care of (same as above) - commit e3b9ba8

markschelbergen · 2021-01-13T13:38:31Z

plot_maps.py

@@ -8,6 +8,7 @@

 """
 from netCDF4 import Dataset


netcdf4 library is not needed anymore, right?

Yes, forgot to remove that, it is included in xarray.
Removed in commit 12e3200

markschelbergen · 2021-01-13T13:55:52Z

parallel_process_data.py

+    $ python process_data.py
+
+"""
+from netCDF4 import Dataset


Still uses netCDF4

yes it does, to write the output - do you think it would be better to change all of that to xarray, too?

That would mean to change the handling of the res dictionary and I think also the whole definition of the output_variables dimensions and so on to using an xarray dataset
then this dataset would still be written to a netcdf4 file

so I think the only merit would be a possibly prettier coding, as the res dictionary could maybe be modified to be an xarray dataset, then just write that...? I'm still getting into the use of xarray in this case, but I think that would be the workflow

markschelbergen · 2021-01-13T13:58:43Z

process_data.py

@@ -19,10 +19,10 @@
 from config import start_year, final_year, era5_data_dir, model_level_file_name_format, surface_file_name_format,\
    output_file_name, read_n_lats_at_once

-# Set the relevant heights for the different analysis types.
+# Set the relevant heights for the different analysis types in meter.#, 1500.],


remove end of sentence

removed the second comment there, see commit 76e0cf4

markschelbergen · 2021-01-13T14:13:01Z

plot_maps.py

+# user input 
+if len(sys.argv) > 1:
+    try:
+        opts, args = getopt.getopt(sys.argv[1:], "hl", ["help", "latitudes"])
+    except getopt.GetoptError:
+        print ("plot_maps.py -l >> process individual latitude files \n -h >> display help")
+        sys.exit()
+    for opt, arg in opts:
+        if opt in ("-h", "--help"):
+            print("plot_maps.py -l >> process individual latitude files \n -h >> display help")
+            sys.exit()
+        elif opt in ("-l", "--latitudes"):
+            # find all latitude files matching the output_file_name specified in config.py
+            output_file_name_short =  output_file_name.split('/')[-1]
+            output_dir = output_file_name[:-len(output_file_name_short)]
+            output_file_name_prefix = output_file_name_short.split('.')[0] + '_' 
+
+            # sorting of provided latitudes included
+            lats = [f.split('_')[-1][:-(len(f.split('.')[-1])+1)] for f in os.listdir(output_dir) if f.split("/")[-1].split("lats")[0] == output_file_name_prefix]
+            lats.sort(key=float) 
+            print(str(len(lats)) + ' latitudes found in directory ' + output_dir + ' for the years ' + str(start_year) + ' to ' + str(final_year) + ':')
+            print(lats)
+
+            resources = [output_dir + output_file_name_prefix + 'lats_' + lat + '.nc' for lat in lats]
+            print('All latitudes are read from files with the respective latitude in a similar fashion to: ' + resources[0])
+            nc = xr.open_mfdataset(resources, concat_dim='latitude')


Please add more comments, because it's hard to follow what exactly you are doing here.

changed to subsets and greatly simplified by using one set input parameter, commit 141ec71

markschelbergen · 2021-01-13T14:15:43Z

parallel_process_data.py

+        try:
+            opts, args = getopt.getopt(sys.argv[1:], "hs:", ["help", "subset="])
+        except getopt.GetoptError:
+            print ("parallel_process_data.py -s SubsetNumber >> process individual latitude subset SubsetNumber \n -h >> display help ")
+            sys.exit()
+        for opt, arg in opts:
+            if opt in ("-h", "--help"):
+                print ("parallel_process_data.py -s SubsetNumber >> process individual latitude subset SubsetNumber \n -h >> display help ")
+                sys.exit()
+            elif opt in ("-s", "--subset"):
+                # Select only a specific latitude subset
+                subset_number_input = int(arg)


Please add more comments explaining how the code works.

This part is conderned with reading the user input, I included a bit more info on what exactly the reading does here, hope it is more clear now (in addition in the first lines I included more detailed instructions as well) via commit fbb2624
Your questions seem to show also that the name of the program is quite misleading, which I changed as well (4963f4e and in the file 084657a)

markschelbergen · 2021-01-13T15:10:45Z

parallel_process_data.py

+    return(res, counter)
+
+
+def process_complete_grid(output_file, subset_number_input):


the function is called complete grid, but you have a new argument 'subset_number_input'. This suggests that the function is not for the full grid.

For better description the subset interpratation was replaced in favor of a latitude & latitude index naming
in commit 68d38e1
Also the function naming has been replaced by more accurate job descriptions

markschelbergen · 2021-01-13T15:12:01Z

parallel_process_data.py

+    if subset_number_input == (-1):
+        i_subset = 0
+    elif subset_number_input >= n_subsets:
+        raise ValueError("User input subset number ({:.0f}) larger than total maximal subset index ({:.0f}) starting at 0.".format(subset_number_input, (n_subsets-1)))
+    else:
+        # User input given - processing only one latitude subset:
+        i_subset = subset_number_input
+        total_iters = len(lons)


take this outside this function

not sure why you ask that, I guess it might also be resolved by better function naming introduced in 68d38e1.

markschelbergen · 2021-01-13T15:15:15Z

parallel_process_data.py

+
+
+
+def process_lats_subset(lats_subset, lenLons, levels, lenHours, v_levels_east, v_levels_north, v_levels, t_levels, q_levels, surface_pressure, heights_of_interest, analyzed_heights_ids, res, start_time, counter, total_iters):


please rewrite such that each parallel process outputs a separate netcdf, which we concatenate in a post processing step

This is already happening, if read_n_lats_at_once is 1 - I had though to keep that variable, but at the moment if its larger, files span multiple latitudes - Now I removed the dependency on read_n_lats_at_once and now just process all latitudes individually
54637c5 & 5dd45a1
Not sure if that's what you wanted though, or if its that best solution - so that we could talk about in Sykpe then.

… getopt part (reading the command line arguments)

…el but is meant to be execued for every single latitude

…ining program use

…ten for each subset

lthUniBonn added 18 commits December 6, 2020 20:52

change config to local buddy - missing: filename correction, processi…

9ac68fa

…ng speed n_lats_at_once

add unit of analyzed heights

27677fa

no need to manually start download for each year separately as config…

929d8cf

… contains start and endyear

change config to local buddy - missing: filename correction, processi…

0d4b134

…ng speed n_lats_at_once

File Directory explicit, not with BUDDY

fb60ddd

set years to only 2019

c81a08d

correction to data path, not full path in output filename

a6d0b3d

ongoing correction to initial config, status of first download attemp…

1d138da

…t for 2019

Downloading specifics for fionas files -> need all corresponding stc

9c215fe

first eval and correction: plotting works, but does not adapt scaling…

9700ffd

… to different heghts - needs fixing, autmatisation maybe

processing information included, leaving out 1500m for now, not inclu…

4caf233

…ded in data with variable height calculation, will be downloaded

new height level config for download - fresh dir for 112 level data

e60cbdf

fix merge: file naming/directories

cf5bf96

Christmas Download configuration, fine grid, 2011 to 2019

e4814d1

awegroup#3 changed plotting input from netcdf reading to xarray, incl…

89e5c40

…uding the changes in data-array handling, plotting still not adaptive

awegroup#4 parallel processing of latitude subsets: individual subset…

5812384

… processing via program arguments, latitude files saved individually in /lats directory

Christmas Download Overhead removed, style only

ac7c0a4

include netcdf4 format also here for now, compatible w/ plotting - ho…

f35149b

…wever full functionality of this code now also included in parallel_process_data.py

markschelbergen suggested changes Jan 13, 2021

View reviewed changes

lthUniBonn added 11 commits January 14, 2021 10:44

restore previous data dirs and file names

e3b9ba8

netcdf4 library not needed anymore, remove import

12e3200

remove end of sentence, old comment

76e0cf4

always write only one latitude, remove dependency on read_n_lats_at_once

54637c5

remove unused variable due to taking only one latitude at a time

5dd45a1

replace subset interpratation by latitudes, better understandable code

68d38e1

add more info on how to use the code in the beginning and also in the…

fbb2624

… getopt part (reading the command line arguments)

rename as the file name was misleading, the code itself is not parall…

4963f4e

…el but is meant to be execued for every single latitude

include name change of file also in the help output/comments

084657a

include name change of file also in the first few comment lines expla…

794a11a

…ining program use

subset processing and writing to files for each subset

3af252a

lthUniBonn added 5 commits January 15, 2021 11:23

output latitude index always starts at 0 for new subset - output writ…

33763bb

…ten for each subset

read subset files

141ec71

remove unnecessary comments and print statements

558e5fb

add more detailed processing status for each subset

fad0f4c

rename subset variables for better understanding of indexing

9d558b7

lthUniBonn closed this Jan 20, 2021

lthUniBonn mentioned this pull request Jan 22, 2021

Parallel processing #7

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introductory Work - up to parallel processing of latitudes #5

Introductory Work - up to parallel processing of latitudes #5

lthUniBonn commented Jan 10, 2021

markschelbergen left a comment

markschelbergen Jan 13, 2021

lthUniBonn Jan 14, 2021

markschelbergen Jan 13, 2021

lthUniBonn Jan 14, 2021

markschelbergen Jan 13, 2021

lthUniBonn Jan 14, 2021

markschelbergen Jan 13, 2021

lthUniBonn Jan 14, 2021

markschelbergen Jan 13, 2021

lthUniBonn Jan 14, 2021

markschelbergen Jan 13, 2021

lthUniBonn Jan 15, 2021

markschelbergen Jan 13, 2021

lthUniBonn Jan 14, 2021 •

edited

Loading

markschelbergen Jan 13, 2021

lthUniBonn Jan 14, 2021

markschelbergen Jan 13, 2021

lthUniBonn Jan 14, 2021

markschelbergen Jan 13, 2021

lthUniBonn Jan 14, 2021

		return(res, counter)


		def process_complete_grid(output_file, subset_number_input):




		def process_lats_subset(lats_subset, lenLons, levels, lenHours, v_levels_east, v_levels_north, v_levels, t_levels, q_levels, surface_pressure, heights_of_interest, analyzed_heights_ids, res, start_time, counter, total_iters):

Introductory Work - up to parallel processing of latitudes #5

Introductory Work - up to parallel processing of latitudes #5

Conversation

lthUniBonn commented Jan 10, 2021

markschelbergen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lthUniBonn Jan 14, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lthUniBonn Jan 14, 2021 •

edited

Loading