Skip to content

How To: Creating dvmdostem inputs and runmask based on shape file.

tobey edited this page Feb 19, 2021 · 1 revision

Not sure where else to put this so dumping it here for now....

The goal is, given a shape file, to create a set of dvmdostem inputs that will run under the shape file.

As our input creation scripts currently work (Feb 2021), we only support rectangular file shapes, and the idea of running only the area in the shapefile is achieved using the run mask to turn off all pixels outside the shapefile's shape boundary. Some space could be saved by filling the input files outside the bounds of the polygon(s) to be run with 0 (or actually NaN), but with this example I am not worrying about that.

This process was developed in Summer 2020 for H. Genet, trying to create inputs for 4 different hydrologic basins in Alaska. So I started with a shape file from H. Genet that had a polygon for each basin. I recorded the process in an email chain in July 2020, and am moving the notes here so I have a chance of finding them next time I need them!

Lets begin...

Starting on personal computer:

  • Open shapefile, IEM veg map in QGIS, make sure everything looked reasonable (georeferencing info being read correctly)
  • Menu: Processing -> GDAL -> Raster Extraction -> Clip Raster by mask layer,
  • Experimented with various options, but basically just leave them at the defaults and choose the mask to be the shape file and the input layer to be the IEM veg map, then run process and view the resulting layer. Looks good! Notice that depending on how the mask layer is selected it might use the whole layer as a mask or use each feature as a mask (1 output layer per feature). Took me a while to figure this out...not obvious from looking at the commands and was a subtle difference in how I initially selected the mask layer thru the GUI
  • Now with the smaller, cropped layer, double click it, and look at the dimensions of this new cropped region file, write them down
  • Then use QGIS lat/lon tool to get lat lon coords of the lower left corner of the cropped region of interest, write them down.
  • Repeat process of collecting size and lower left corner coords for each individual polygon.

Now switch to Atlas:

  • Log on, source my python environment setup script
  • make a csv text file with the coords, pasted here in case atlas goes down again:
(newenv) tcarman2@atlas ~/dvm-dos-tem [master] $ cat hgenet_fourbasins.txt 
site,lon,lat,xsize,ysize
all_four_basins,-149.42144819,64.22833937,277,150
west_basin,-149.35454036,64.6400155,150,104
midwest_basin,-148.20852869,64.70012365,160,74
mideast_basin,-147.01343243,64.33512659,137,105
east_basin,-145.65271472,64.068174804,93,74
  • Run the dvmdostem "create_region_input.py" script to create slurm wrappers. Made a quick addition to the script so that it could set unique sizes for each dataset based on columns in the CSV file. Hadn't ever needed this before.
(newenv) tcarman2@atlas ~/dvm-dos-tem [master] $ ./scripts/create_region_input.py --slurm-wrappers-from-csv hgenet_fourbasins.txt --tifs /atlas_scratch/ALFRESCO/ALFRESCO_Master_Dataset_v2_1/ALFRESCO_Model_Input_Datasets/IEM_for_TEM_inputs/ --withlatlon --lonlat --projected-climate-config mri-cgcm3

Then submit the wrapper scripts using sbatch.

See steps below for creating run masks from the polygons.

Update Feb 2021, creating inputs again for different projected data

Started with the polygon coordinates from above, now running this process again to create the GFDL-CM3 inputs (also stored on atlas, but in a different directory layout than our other inputs (ncar-ccsm4, mri-cgcm3).

This took some more modifications to the create_region_input.py script to get it to read all the input tiffs from the right paths. As of Feb 19 2021, these have not been merged yet, but should be fairly soon.

So the process is:

  1. run the create_region_input.py script to dump an empty config file
  2. fill in the config file with all the paths
  3. run create_region_input.py to generate the slurm wrappers
  4. check the wrappers, and submit them using sbatch

Then here is the part for making the run masks:

PYTHON_TOUCHUP=$(cat <<'END'
import sys
import netCDF4 as nc
import numpy as np
orig = nc.Dataset(sys.argv[1], "r")
poly = nc.Dataset(sys.argv[2], "r")
merged = nc.Dataset(sys.argv[3], "a")
merged.variables["run"][:] = np.where(np.logical_and(orig.variables["run"][:]>0, poly.variables["Band1"][:]>0), 1, 0)
orig.close()
poly.close()
merged.close()
END
)

BASE_DIR="/home/UA/tcarman2/dvm-dos-tem/input-staging-area/"
SHAPEFILE="/home/UA/tcarman2/downloads/fourbasins/fourbasins.shp"
for BASIN in Chat_basin_104x150 Chena_basin_74x160 Salcha_basin_105x137 GoodP_basin_74x93;
  do
    echo $BASIN
    PROJCLM=gfdl-cm3

    ORIG="$BASE_DIR"/cru-ts40_ar5_rcp85_"$PROJCLM"_"$BASIN"/run-mask.nc
    POLY="$BASE_DIR"/cru-ts40_ar5_rcp85_"$PROJCLM"_"$BASIN"/run-mask-polygon.nc
    FINAL="$BASE_DIR"/cru-ts40_ar5_rcp85_"$PROJCLM"_"$BASIN"/run-mask-orig-and-polygon.nc
    
    SHORT_BASIN_NAME=$(python -c "import sys; print(sys.argv[1].split('_')[0])" $BASIN);
    echo $SHORT_BASIN_NAME
    
    gdalwarp -overwrite -of NETCDF -dstnodata 0 -cutline $SHAPEFILE -cl fourbasins -cwhere "basin = '$SHORT_BASIN_NAME'" "NETCDF:$ORIG:run" "$POLY";

    cp $ORIG $FINAL
    echo "Calling touchup with $ORIG $POLY $FINAL"
    python3 -c "$PYTHON_TOUCHUP" "$ORIG" "$POLY" "$FINAL"

done

It looks complicated, but only because it is generalized so it will run over all 4 input cases w/o me having to type too much.

The general ideas is to use gdalwarp and the -cutline option to make a new run mask with only the pixels within the polygon enabled. The little python HEREDOC snippet is used to make sure that in the final run mask, the both the polygon and any "nodata" (lake, rivers, mountains) pixels are honored.

Looks like gdal_rasterize might be another good way to go.