- Introduction
- Citing the LTS framework
- General overview
- Usage guide
- System requirements
- Acknowledgements
The Lagrangian Transport Scenarios (LTS) repository contains the framework to run, process and visualise oceanic Lagrangian particle-tracking simulations. The LTS framework was developed to standardize and simplify running a wide array of Lagrangian ocean scenarios, where each scenario contains a set of kernels that describe the behavior of particles which are advected with a specified set of oceanographic and meteorological data.
Almost all the code within this repository is written in python, with bash scripts being used to submit and run jobs. All the Lagrangian simulation code uses the Parcels (Probably A Really Computationally Efficient Lagrangian Simulator) package, which is described in detail in Lange & van Sebille (2017) and Delandmeter & van Sebille (2019). For explanations of Parcels, please refer to the the OceanParcels website, which has a number of tutorials and references to various use-cases of Parcels.
The LTS framework is largely based on the work for the Onink et al. (2021) paper, which described stochastic plastic beaching and resuspension parametrizations within large-scale ocean simulations. However, various other scenarios and kernels have been described in other publications outlined below.
- The stochastic beaching and resuspension kernels within the
Stochastic
andShoreDependentResuspension
scenarios are described in Onink et al. (2021). - The KPP vertical turbulent mixing kernel is described in Onink et al. (2022), with the model code to simulate 1D vertical transport of buoyant particles being available here.
- The
SizeTransport
andFragmentationKaandorpPartial
scenarios are described in Onink et al. (in prep) "The influence of particle size and fragmentation on large-scale microplastic transport in the Mediterranean Sea"
For any model components described above, please cite the relevant paper. For citing the LTS framework itself, please cite the repository on zenodo.
The LTS framework is set up for three basic tasks: run the Lagrangian simulations, analyze the output and visualize the post-processed Lagrangian simulations. For each of these tasks, there is a dedicated submission bash script in the bin
directory which is used to submit jobs onto the server. Currently, all these script are written for the ubelix HPC cluster at the University of Bern, which uses a version of the Slurm workload manager. In principle the LTS framework can be applied to run servers that use different workload managers, but this would require modification of the submission scripts.
To speed up computation, the particle tracking simulations are parallelised into runs
, where each run contains a subselection of the total number of particles in the simulation. This subselection is allocated at the beginning of the simulation according to the starting locations of the particles and remain fixed throughout the entire duration of the simulation, and the number of runs is dependent on the input scenario. Each run is also further subdivided into according to the restart
variable. All multi-year simulations are subdivided into 1-year sections, where restart = 0
indicates that this is a new simulation starting from the initial input files. The second simulation year would then be indicated by restart = 1
, where each subsequent simulation year only starts when the simulation for the previous simulation year is complete (e.g. restart = 2
only starts when restart = 1
ends, which in turn only started once restart = 0
is complete). Breaking up the simulation into these subsimulations according to the restart
and run
variables speeds up the computational time to run these simulations, and implies that an error in a later simulation year doesn't require rerunning the entire simulation.
The bin/simulation_submission.sh
script also allows the user to set a number of variables, such as beaching timescales, fragmentation rates, ensemble members, etc. These are saved in the local .env
file, which are then loaded into the src/settings.py
file. This governs the scenario parameters, and the submission script is set up for easily submit a large number of jobs for e.g. a range of beaching timescale values.
The bin/analysis_submission.sh
is used to submit analysis jobs that take the unprocessed Parcels output and produce post-processing output files. A number of standardized analysis functions are built into the LTS framework, such as for calculating annual mean horizontal and vertical concentration distributions, calculating bayesian statistics and simple timeseries of beaching/adrift particle fractions. For almost all these analysis functions, the analysis consists of two steps. For the first step, a job is submitted that carries out the specified analysis procedure for each individual restart
/run
file and produces an intermediate output file. The second step then takes all these intermediate output files and compiles them into one output file which contains the analysis results for the simulation as a whole. For model scenarios with a N restart
/run
files, this two step procedure can increase the speed of the analysis by a factor ~N. Almost all analysis functions are standardized to work with any basic scenario, but a number are scenario specific (e.g. src/Analysis/parcels_to_sizespectrum.py
).
Finally, the bin/visualization_submission.sh
script is used to generate figures and animations of the post-processed data from the analysis stage. A number of standard functions for e.g. creating figure with a (M, N) subplot array or cartopy maps are provided, but in general each scenario will have its own set of code to generate figures.
In general, any general questions about the LTS framework (for either understanding it or applying it to another setting) can be posted in the Issues section of this repository. However, a few basic use cases are described below.
Running simulations backwards in time
The LTS framework is set up to run both forward-in-time and backward-in-time simulations for any model scenario. Within the bash scripts, the BACKWARD
variable specifies if a simulation is forward or backward, where BACKWARD = 0
indicates forward in time and BACKWARD = 1
indicates backwards in time.
Running LTS on a different server
In order to run LTS on a new server, it is first important to modify the job submission bash scripts in the bin
directory to state SERVER='NewServerName'
. Depending on the exact setup of this new server, it is likely also necessary to modify the job submission scripts to account for a different workload manager. In case the new server uses Slurm, these modifications might be minor (e.g. updating the partition and qos names to whatever names are used on the server). For other workload managers, this could require significantly more effort, and if necessary perhaps ask your local systems administrator for assistance.
Once the bash scripts are done, go to src/settings.py
and modify the DATA_DIREC
variable to include the path to the model data on 'NewServerName'
. This then automatically updates other directory paths such as DATA_INPUT_DIREC
, DATA_OUTPUT_DIREC
and FIGURE_OUTPUT_DIREC
. However, SCRATCH_DIREC
will need to be updated manually since the scratch directory is likely not a subdirectory of DATA_DIREC
.
In principle, this is all that is required to set up LTS to run on a new server. Just make sure that all the necessary oceanographic/meteorological data is available on the server following the paths set in src/advection_scenarios/advection_files.py
and you should be good to go.
Introducing new model variables
In order to create a new variable that can be accessed throughout the LTS framework, you simply need to add it to src/settings.py
, where the convention is to have all src/settings.py
variables be completely capitalised. If the variable is e.g. a physical constant such as the density of air or some other quantity that doesn't generally vary between different scenarios, then it is easiest just to give it a set value within the settings file. However, if it is a model parameter (e.g. beaching timescale, fragmentation timescale, etc.), then it is preferable to set the value using the load_env_variable()
function. This loads the variable value from the .env
file, where the value can be set within the bash submission scripts.
Creating a new model scenario
In order to create a new scenario, a number of steps need to be taken:
- Within the bash scripts, set
SCENARIO='NewScenario'
. Thesrc/settings.py
file will load theSCENARIO
variable factor, and then know that you are running aNewScenario
simulation. If you need to specify new model variables within the bash scripts, then do so as well. - Not much needs to happen within
src/settings.py
, except to update it for any new scenario variables you defined within the bash scripts before and to update the log section at the end of file to print the particular model variables relevant forNewScenario
- Within
src/scenarios
, you need to create a new scenario file forNewScenario
. Each scenario has its own python Class, which contains all the necessary functions to run a new simulation. This scenario class is actually a derived class fromBaseScenario
, which is defined insrc/scenarios/base_scenario.py
. As such, there are a number of mandatory functions that need to be defined in theNewScenario
class in order for the scenario to work, such as functions defining the particle behavior and the names of the output files. Please refer tosrc/scenarios/example_scenario.py
or any of the other scenario files for examples in how to set up a scenario. TheBaseScenario
class contains methods such asBaseScenario.run()
to actually carry out the final model run, but if you need a modified form of these functions then you can define them again in the derived class to overwrite these functions in the base class. - Once you have created the scenario file, you must import the scenario class in
src/factories/scenario_factory.py
and add the new scenario to theScenarioFactory.create_scenario()
method. Once these steps have been taken, everything should be ready to go to run anyNewScenario
simulation. In principle, the analysis functions should be ready as well, although it might be necessary to modify some of the functions if you have any model parameters that you want to loop through aside from justrun
orrestart
(seesrc/Analysis/parcels_to_concentration.py
as an example of how changes were made for theSizeTransport
andFragmentationKaandorpPartial
scenarios.) If you want to also build in visualization capabilities, then there is an additional step: - Add a
NewScenario
directory tosrc/visualization
, within which you createNewScenario_Figures.py
. This file will then have a functionrun()
, which will contain all the code for creating figures. - Go to
src/factories/visualization_factory.py
and addvisualization.NewScenario.NewScenario_Figures.run()
to theVisualizationFactory.run()
method. This will finalize visualization capabilities forNewScenario
.
Creating a new advection scenario
The advection scenario refers to the set of data including the ocean currents, temperature/salinity, mixed layer depth, wind, etc. that is used to drive the particle transport simulation. Given that the ocean currents are generally the main driver of particle transport, the advection scenario is generally named after the ocean current data (e.g. CMEMS_MEDITERRANEAN
refers to the advection scenario where the ocean currents are from the CMEMS Mediterranean Sea Physics Reanalysis).
Within the submission scripts and src/settings.py
, the only steps that need to be taken is that the ADVECTION_DATA
variables within the bash scripts must be updated to the new advection scenario name NewAdvection
. This is then automatically set within the src/settings.py
file.
The FieldSet
object that contains all the field data paths within the Parcels simulations is created with the FieldSetFactory.create_fieldset()
method defined in src/factories/fieldset_factory.py
. Within the specific scenario class, the FieldSetFactory.create_fieldset()
method is called, where the variables specify what data is supposed to be loaded (e.g. FieldSetFactory.create_fieldset(wind=True)
will create a FieldSet
object containing wind data). Now, the FieldSetFactory.create_fieldset()
knows the paths to all the necessary data by loading the file_dict
variable, which is defined within src/advection_scenarios/advection_files.py
.
src/advection_scenarios/advection_files.py
contains the AdvectionFiles
class, where the AdvectionFiles.file_names()
method returns the file_dict
dictionary that contains all the paths to the data necessary to create the FieldSet
object and run the simulation. Each advection scenario has its own set of paths, and for all the variables the file_dict
contains the file names, the variable names and the data dimensions. Please check the other advection scenarios for naming conventions, as otherwise the FieldSetFactory.create_fieldset()
will not be able to read in the data. It is not necessary that each advection scenario specifies paths for all types of oceanographic data, but you do have to make sure you specify paths for all the data you call with the FieldSetFactory.create_fieldset()
method (e.g. if you run FieldSetFactory.create_fieldset(wind=True, MLD=False)
, you need to have paths specified for the wind data, but not for the mixed layer depth data).
Aside from file paths for oceanographic data specific to an advection scenario, there are also a number of input files that need to be created in order for e.g. beaching and diffusion kernels to run. In principle code is provided to automatically calculate distance to shore, land ID and horizontal grid size files. However, all these codes were written under the assumption that the ocean current data for the zonal and meridional currents are provided on an A grid (so no staggered grids). In order for the code to run on C grids, the following files would need to be updated:
src/advection_scenarios/create_boundary_current.py
src/advection_scenarios/create_distance_to_shore.py
src/advection_scenarios/create_grid_spacing.py
src/advection_scenarios/create_land_ID.py
src/advection_scenarios/create_tidal_Kz_files.py
, where this is only necessary if you are running simulations including vertical tidal mixing.
Creating a new input scenario
A number of input scenarios are currently specified within the LTS framework:
Jambeck
: This weighs inputs according to mismanaged waste estimates from Jambeck et al. (2015) and coastal population densities. A fixed fraction of all estimated mismanaged waste within 50 km of the ocean is estimated to enter the ocean, and the input distribution and particle weights are scaled according to the estimated waste inputLebreton
,LebretonDivision
andLebretonKaandorpInit
are based on riverine plastic inputs from Lebreton et al. (2017), where the Lebreton inputs are used for the spatial distribution and/or the weights of the particlesPoint_Release
: ReleasesPARTICLE_NUMBER
number of particles at a single point, where this point is indicated by theRELEASE_SITE
variable within the submission scripts and theSITE_LONLAT
withinsrc/settings.py
.Uniform
: Releases particles throughout the model domain with a fixedRELEASE_GRID
degree spacing. To create the input files for each of these input scenarios, thecreate_input_files.create_files()
method is run withinsrc/advection_scenarios/advection_files.py
. This method assures that all the inputs are placed on ocean cells instead of on land. However, all this code is again written assuming that the zonal and meridional currents are provided on an A grid. In order to run simulations with C grid data,create_input_files.create_files()
would need to be modified.
In order to create an entirely new input scenario, the INPUT
variable within the submission scripts need to be set to INPUT=NewInput
. Then, INPUT_DIREC_DICT
within src/settings.py
needs to be updated with the path to the directory that will contain the input files.
Subsequently, if the new input scenario scales the particle release according to input estimates such as riverine inputs, the INPUT_MAX
and INPUT_MIN
variables within src/settings.py
specify the maximum and minimum weight a single parcels particle represents. For example, if INPUT_MAX = 1
and INPUT_MIN = 0.2
and we have a site with 2.1 input units, then we'd release 2 particles at this site with weight 1, whereas the remaining 0.1 input units is not large enough to warrant creating another particle. This means that some inputs are neglected, but this can dramatically reduce the number of particles (and thus computational cost) for a simulation. By setting the INPUT_MAX
and INPUT_MIN
parameters the user can determine which sources to include or not. Then, INPUT_DIV
sets the number of particles within each run
, where e.g. a simulation with 25,000 particles and INPUT_DIV = 10 000
would mean the simulation is broken up into three runs with 10 000, 10 000 and 5 000 particles. Again, it is up to the discretion of the user to determine a suitable INPUT_DIV
value.
Then, create_input_files.create_files()
needs to be updated to account for the new input scenario. For this, we refer you to the code for the other input scenarios for examples on how to set up this code.
Once the number of runs has been determined, it is important to update RUNRANGE
in the submission scripts and src/settings.py
, such that the code knows how many runs to submit jobs for, and how many output files need to be considered in the final analysis.
Creating a new analysis function
In order to create a new general analysis function, follow the subsequent steps:
- Add a variable
NEWANALYSIS
to thebin/analysis_submission.sh
script, whereNEWANALYSIS = 0
indicates not to run this analysis step andNEWANALYSIS = 1
does indicate to run it. - Add
NEWANALYSIS
tosrc/settings.py
, which loads theNEWANALYSIS
variable from the.env
file. - Add a file for the analysis procedure to the
src/Analysis
directory. The typical naming convention isparcels_to_NEWANALYSIS.py
, which will give a general indication of what type of analysis is being conducted in this run. - If possible, we suggest setting up the
parcels_to_NEWANALYSIS.run()
class so that it follows the two step parallelization procedure outlined earlier, as this will generally speed up the code. For examples on how to approach this, please refer to the other codes. In addition, we suggest making the code as general as possible so that it might be applied to other scenarios as well. However, this might not always be possible and if not, we suggest adding a statement such asassert self.scenario_name in ['ScenarioName'], "The NEWANALYSIS function is not set up for {}".format(self.scenario_name)
to prevent the code from running for scenarios for which it is not intended. - Once the analysis code has been written, go to
src/analysis_factory.py
and add the new analysis code to theAnalysisFactory.run_analysis_procedure()
method.
To install the necessary packages for the LTS framework, run conda env create -f environment_LTS.yml
. This will install all the necessary packages and dependencies to run all the LTS code.
The following people have been involved in the development of the LTS framework:
- Victor Onink has done the full implementation and had a leading role in the development of all the scenarios within the LTS framework.
- Charlotte Laufkötter has been involved in the conceptual and theoretical development of all the scenarios.
- Inger van Boeijen helped in the initial structuring of the LTS framework.
- Erik van Sebille was involved in the conceptual and theoretical development of all the scenarios.
- Christian Kehl and Philippe Delandmeter contributed to the development of all Parcels components of the LTS framework.
- Cleo Jongedijk assisted in the conceptual development of the beaching and resuspension parametrizations in the
Stochastic
scenario. - Mikael Kaandorp was involved in the development of the
FragmentationKaandorpPartial
scenario.