Skip to content

Building my comparison part 1: datasets_setup.py and atlas subdirectories

jservonnat edited this page Feb 9, 2018 · 14 revisions

The C-ESM-EP is a way to apply collections of diagnostics to a set of simulations.

It is intimately linked with the concept of comparison.

In the C-ESM-EP vocabulary, a comparison is a directory containing:

  • subdirectories for the atlases (collections of diagnostics); each one contains a parameter file that controls the execution of the diagnostics in main_C-ESM-EP.py
  • a python file datasets_setup.py: this is where the user specifies the datasets that will be taken as inputs of the C-ESM-EP

Foreword: keep only the atlases you need for your comparison

It is advised to keep in your comparison directory only the atlases subdirectories you need. The C-ESM-EP runs only the atlases available in the comparison directory. As well, the C-ESM-EP frontpage contains only the links to the atlases available for the comparison. Do not hesitate to remove the subdirectories you don't need to avoid unnecessary computation and storage of the results. The atlases subdirectories (with the parameter files) are available in standard_comparison (or a 'git pull' away) and share/optional_atlas if you need them.

Adding my datasets to datasets_setup.py

In datasets_setup.py, there is a python list 'models' which elements are python dictionaries describing the access to the datasets. They are basically the set of keywords/values that are provided to the CliMAF ds() function to access the data, without the keyword 'variable'. We will see now how to add your own datasets.

In CLiMAF, the different data structures are described with CliMAF 'projects'. Each CliMAF project provides access to the datasets through keywords/values:

- simulation
- frequency
- period

and other keywords that are specific to the projects.

The most commonly used projects are 'CMIP5' (CMIP5 archive on Ciclad) and 'IGCM_OUT' (data tree produced by libIGCM = most of the model outputs produced at IPSL).

Here is an example of a CMIP5 dataset definition and an IPSLCM6 coupled model simulation in datasets_setup.py:

models = [
   dict(
	project = 'CMIP5',
	model = 'IPSL-CM5A-LR',
	experiment = 'historical',
	simulation = 'r1i1p1',
	frequency = 'monthly',
	period = '1980-2005'
       ),
   dict(project = 'IGCM_OUT',
        root = '/path_to_thredds',
        login = 'p86caub',
        model = 'IPSLCM6',
        simulation = 'CM605-LR-pdCtrl01',
        frequency = ’monthly',
        clim_period = ‘last_20Y'
       )
]

Note: /path_to_thredds is the root path to the thredds until the login (well known on Ciclad and Curie)

Using the common_keys

In datasets_setup.py, you will see a mechanism to specify common keys for the elements of models. This to avoid duplicating the keywords in the dictionaries of models that are the same among a set of datasets. The mechanism in standard_comparison/datasets_setup.py adds the key/values to the IGCM_OUT dataset dictionaries that are not already specified in models.

Example with a set of simulations for an ORCHIDEE meeting:

models = [
      # -- Coupled models
      dict(project='IGCM_OUT', login='p86fair', simulation='CM6014-pd-splith-01', color='green' ),
      dict(project='IGCM_OUT', login='p86maf', simulation='CM6014-pd-split-D-01', color='red'),
      dict(project='IGCM_OUT', login='p86maf', simulation='CM6014-pd-ttop-01', color='blue'),

      # -- LMDZOR
      dict(project='IGCM_OUT', login='p86ghatt', model='LMDZOR', status='PROD',
           experiment='ref4438', simulation='CL5.4438.L6010.ref'),
      dict(project='IGCM_OUT', login='p86ghatt', model='LMDZOR', status='PROD',
           experiment='ref4438', simulation='CL5.4438.L6010.alt1'),

      # -- ORCHIDEE offline
      dict(project='IGCM_OUT', login='p529bast', model='OL2', status='PROD',
           experiment='ref4783', simulation='FG2.4783.v3'),
      dict(project='IGCM_OUT', login='p529bast', model='OL2', status='PROD',
           experiment='ref4783', simulation='FG2.4783.v4'),
      dict(project='IGCM_OUT', login='p529bast', model='OL2', status='PROD',
           experiment='ref4783', simulation='FG3.4783.v3'),
      dict(project='IGCM_OUT', login='p529bast', model='OL2', status='PROD',
           experiment='ref4783', simulation='FG3.4783.v4'),

]

# -- Provide a set of common keys to the elements of models
# ---------------------------------------------------------------------------- >
common_keys = dict(
           root='/path_to_thredds', login='*',
           model='IPSLCM6',
           frequency='monthly',
           clim_period='last_10Y',
           ts_period='full',
           )

for model in models:
  if model['project']=='IGCM_OUT':
    if '-pi' in model['simulation']:
        model.update(dict(experiment='piControl'))
    if '-pd' in model['simulation']:
        model.update(dict(experiment='pdControl'))
    for key in common_keys:
        if key not in model:
           model.update({key:common_keys[key]})

ts_period, clim_period and the period manager

The C-ESM-EP contains diagnostics on climatological averages, and other on time series. This way the user can specify a period for the climatologies, clim_period, and one for the time series, ts_period. Example:

   dict(project = 'IGCM_OUT',
        root = '/path_to_thredds',
        login = 'p86caub',
        model = 'IPSLCM6',
        simulation = 'CM605-LR-pdCtrl01',
        frequency = ’monthly',
        clim_period = ‘last_20Y'
        ts_period   = ‘full'
       )

clim_period and ts_period can take either real dates (ex: 1980-2000, 2100_2169...), or 'instructions' like 'last_20Y', 'first_1Y' or 'full' (explicit).

Those instructions are user-friendly ways to work on the XX last or first years of a simulation, without having to actually search for those dates by yourself.

This task is devoted to the period manager. The period manager is a C-ESM-EP functionality. It works for IGCM_OUT (and other IGCM_OUT related projects), CMIP5 and will work for the upcoming CMIP6 project. If you want to add your project to the C-ESM-EP and use the period manager, contact jerome . servonnat at lsce . ipsl . fr

The period manager works for monthly (CMIP5 and IGCM_OUT) and seasonal (IGCM_OUT only) frequencies. For this latter, use instructions like 'last_SE' or 'first_SE'.

The customname: control the name in the plot

By default, the C-ESM-EP will build a string based on the model name for CMIP5 datasets, and on the simulation name for the other projects (and the 'product' for the ref_ts and ref_climatos projects that give access to the reference products). If you want to provide a custom name to identify your simulation in the plots, you can use the keyword customname in the dictionary of the dataset:

  dict(project = 'IGCM_OUT',
       root = '/path_to_thredds',
       login = 'p86caub',
       model = 'IPSLCM6',
       simulation = 'CM605-LR-pdCtrl01',
       frequency = ’monthly',
       clim_period = ‘last_20Y'
       ts_period   = ‘full'
       customname  = ‘My favorite simulation'
      )

Control the reference used to compute the differences

The C-ESM-EP performs a lot of comparisons with a reference. The reference can be either a reference product (observations, reanalysis...) or a simulation.

The variable reference controls this in datasets_setup.py . By default, it is set to 'default'. This means that the C-ESM-EP will use a set of pre-defined reference products for the different variables.

If you want to use a simulation as reference, provide a dataset dictionary to reference:

reference = dict(project = 'CMIP5', model='CNRM-CM5', experiment='historical',
                 frequency='monthly', period='1980-2005',
                 customname='CMIP5 CNRM-CM5'
                 )

One word on colors to identify your datasets in the C-ESM-EP

The C-ESM-EP uses colors to identify your datasets for the time series (MainTimeSeries) and the metrics (ParallelCoordinates_Atmosphere, TuningMetrics, HotellingTest). By default, the C-ESM-EP will automatically attribute a color to each dataset if the user didn't specify one. The colors are taken in order from this list:

cesmep_python_colors = ['royalblue', 'red', 'green', 'mediumturquoise', 'orange',
                        'navy', 'limegreen', 'steelblue', 'fuchsia',
                        'blue', 'goldenrod', 'yellowgreen', 'blueviolet', 'darkgoldenrod', 'darkgreen',
                        'mediumorchid', 'lightslategray', 'gold', 'chartreuse', 'saddlebrown', 'tan',
                        'tomato', 'mediumvioletred', 'mediumspringgreen', 'firebrick',
                        ]

The function that handles the colors in the C-ESM-EP is the colors_manager. It returns a list of colors (hashtag color names) that can be understood by both python and R scripts used in the C-ESM-EP.

It gives priority to the user specified colors. Note that the user can specify the same color to multiple datasets.

Note: the colors_manager will return 'royalblue' if the user specifies 'blue'.