Skip to content

How To: Run dvmdostem and plot output via Docker

tobey edited this page Jan 14, 2022 · 5 revisions

Overview

This tutorial is not meant to be exhaustive and assumes that you have already setup Docker and gotten dvmdostem running with Docker. See notes in the Dockerfile and the docker-compose.yml file to build your images and launch the containers with volumes attached.

In general the steps to making a dvmdostem run are as follows:

  1. Decide where on your computer you want to store your model run(s).
  2. Decide what spatial (geographic) area you want to run.
  3. Decide what variables you want to have output
  4. Decide on all other run settings/parameters:
    • Which stages to run and for how many years.
    • Is the community type (CMT) fixed or driven by input vegetation.nc map?
    • For which stages should the output files be generated and saved?
    • Calibration settings if necessary (--cal-mode).
    • Any other command line options or environment settings.
  5. Launch the run
  6. Make plots or other analysis.

This tutorial will walk through doing a very basic dvmdostem run and plotting the output using the Docker stack. Depending on how your computer is setup, you might find it easier to run the python helper scripts natively rather than thru a Docker container.

Get setup

Get an interactive session on the dvmdostem-run Docker container:

$ docker-compose exec dvmdostem-run bash
develop@ac7293bc134b:/work$ 

Note, compile the code if necessary by running make in the /work directory.

Note that part of the dvmdostem_run Docker image is that the PATH environment variable has been set to include the dvmdostem binary and the scripts/ directory.

Now change into the workflows directory:

develop@ac7293bc134b:/work$ cd /data/workflows/

Also note that this directory (mounted in the container at /data/workflows) is a named volume in the docker-compose.yml file and should be mapped to some location on your host machine. Some (most??) of the following steps can be done from the host machine if you prefer.

Now run the setup script, picking some input data to use. For this case we just arbitrarily select something from the input catalog. I'd have to look at a map to know where Chevak is;

develop@ac7293bc134b:/data/workflows$ setup_working_directory.py --input-data-path /data/input-catalog/cru-ts40_ar5_rcp85_ncar-ccsm4_CALM_Chevak_10x10/ test-run

Now change into the directory that you just created for your run:

develop@ac7293bc134b:/data/workflows$ cd test-run

# Check the files that should have been created with the setup script
develop@ac7293bc134b:/data/workflows/test-run$ ls
calibration  config  output  parameters  run-mask.nc

The idea is that each run will exist in its own self-contained directory with all the config files and the output data. This way the run can be easily adjusted, re-run, and archived for later use without losing any provenance data. The one linked item will be the actual driving input data files - they are not copied into the run directory, but are simply linked by specifying paths in the config/config.js file. If you need you could copy the inputs into the run directory and adjust the paths in the config/config.js file accordingly. All the parameters for the run are in the parameters/ directory, there is run-mask.nc for controlling which pixels to run, a folder for calibration info, a folder for the outputs and a folder for the config file and the output specification file.

For this totally arbitrary run, lets turn on outputs for all run-stages (except pre-run). So open the config/config.js file and make sure that the following are all set to 1 (perhaps easier to do with a text editor on your host because the docker image doesn't have any editors installed):

  "IO": {
    ...
    "output_nc_eq": 1,
    "output_nc_sp": 1,
    "output_nc_tr": 1,
    "output_nc_sc": 1
  ...
  }

Now let's adjust the run-mask so that we only run 1 or 2 pixels. Note that you can use the --show option to see what the mask looks like before and after adjusting it. We'll turn on 2 pixels here, just for fun:

develop@882b37164b39:/data/workflows/test-run$ runmask-util.py --reset --yx 0 0  run-mask.nc 
Setting all pixels in runmask to '0' (OFF).
Turning pixel(y,x) (0,0) to '1', (ON).
develop@882b37164b39:/data/workflows/test-run$ runmask-util.py --yx 1 1 run-mask.nc 
Setting all pixels in runmask to '0' (OFF).
Turning pixel(y,x) (1,1) to '1', (ON).

Note that you don't want to pass --reset to the second call, or it will disable the first pixel you enabled.

And now we'll enable a few more output variables, also just for fun:

develop@1538e66f79f1:/data/workflows/test-run$ outspec_utils.py config/output_spec.csv --on RH y layer
                Name                Units       Yearly      Monthly        Daily          PFT Compartments       Layers    Data Type     Description
                  RH            g/m2/time            y                   invalid      invalid      invalid            l       double     Heterotrophic respiration
develop@1538e66f79f1:/data/workflows/test-run$ outspec_utils.py config/output_spec.csv --on VEGC m pft  
                Name                Units       Yearly      Monthly        Daily          PFT Compartments       Layers    Data Type     Description
                VEGC                 g/m2            y            m      invalid            p                   invalid       double     Total veg. biomass C

Note that there the output specification is simply a csv file, so you can edit it by hand, the outspec_utils.py is simply a convenience tool with some error checking.

Also note that the order of arguments to outspec_utils.py is very counterintuitive. Basically the file you want to modify needs to be the first argument so that it doesn't get confused with the resolution specification.

Launch your run

Finally we are set to run the model. With the command line we set the number of years to run each stage. In a real run --eq-yrs might be something like 1500 and sp-yrs something like 250. But for testing I am too impatient to wait for that. Plus we enabled fairly hi-resolution outputs so it would result in a lot of data.

develop@1538e66f79f1:/data/workflows/test-run$ dvmdostem --pr-yrs 50 --eq-yrs 100 --sp-yrs 75 --tr-yrs 115 --sc-yrs 85 --force-cmt 4

Note that we use --force-cmt 4, simply because I know that CMT 4 has been calibrated and the parameter files will work. Since I am not sure what the vegetation is at Chevak, it is possible that the veg map (input file: vegetation.nc) has CMTs that have not been calibrated and have invalid values in their parameter files.

After a few minutes, the simulation should be done. Check the run-status.nc file in the output/ folder to see that your two pixels completed ok, and note the other files that are present in the output/ folder:

develop@1538e66f79f1:/data/workflows/test-run$ ls
GPP_yearly_eq.nc   GPP_yearly_tr.nc   RH_yearly_sp.nc    VEGC_monthly_sc.nc restart-eq.nc      restart-sp.nc
GPP_yearly_sc.nc   RH_yearly_eq.nc    RH_yearly_tr.nc    VEGC_monthly_sp.nc restart-pr.nc      restart-tr.nc
GPP_yearly_sp.nc   RH_yearly_sc.nc    VEGC_monthly_eq.nc VEGC_monthly_tr.nc restart-sc.nc      run_status.nc

develop@1538e66f79f1:/data/workflows/test-run$ ncdump output/run_status.nc 
netcdf run_status {
dimensions:
	Y = 10 ;
	X = 10 ;
variables:
	int run_status(Y, X) ;
data:

 run_status =
  100, 0, 0, 0, 0, 0, 0, 0, 0, 0,
  0, 100, 0, 0, 0, 0, 0, 0, 0, 0,
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ;
}

Plotting

There are actually a lot of plotting tools buried in our scripts/ directory but none of them are particularly polished. Many but not all of the scripts have decent info with the --help flag. There is not a consistent pattern for whether plots are saved or shown in an interactive window, and in the cases the plots are saved, the file names are not particularly helpful or standardized.

The plotting also relies on a variety of specific libraries, and not everything has been tested with the most recent versions specified in the requirements.txt file, so you might encounter small issues with the scripts that have to be resolved before they will run. Frequently this is just a matter of updating deprecated function calls for libraries like matplotlib or pandas that have been changed since we first wrote the plotting tools. Please update and submit PRs if you encounter any of these issues!

Also note that there is a script, output_utils.py that is designed to be imported into other python scripts and has a bunch of functions for summarizing variables over various dimensions (layers, pfts, etc).

Finally, lets run some plotting scripts:

develop@1538e66f79f1:/data/workflows/test-run$ plot_output_var.py --file output/VEGC_monthly_sc.nc

develop@1538e66f79f1:/data/workflows/test-run$ plot_output_var.py --file output/RH_yearly_tr.nc

develop@1538e66f79f1:/data/workflows/test-run$ output_utils.py --timeres monthly --yx 0 0 basic-ts --stitch tr  --vars VEGC --savename junk output

Note that the command line interface for this is misleading and there is a bunch of stuff that we never finished implementing, like the --savename.