diff --git a/docs/img/overview.png b/docs/img/overview.png
new file mode 100644
index 00000000..4b406cf6
Binary files /dev/null and b/docs/img/overview.png differ
diff --git a/docs/source/dataset.rst b/docs/source/dataset.rst
index b2b132e4..e5a5f746 100755
--- a/docs/source/dataset.rst
+++ b/docs/source/dataset.rst
@@ -3,11 +3,10 @@
 Dataset
 +++++++
 
-The dataset configuration can be found :ref:`here <configurationdefaultparam>`, this file contain 
-specific information about input data parameters for the execution of your command. The documentation
-on the parameters use is explain in the :ref:`yaml <yamlparameters>` section.
+The dataset configuration defines the data (images, ground truth) and their parameters. The documentation
+on the parameters used is explained in the :ref:`yaml <yamlparameters>` section.
 
-The sampling and inference steps requires a csv referencing input data. An example of input csv for
+The tiling and inference steps requires a csv referencing input data. An example of input csv for
 massachusetts buildings dataset can be found in 
 `tests <https://github.com/NRCan/geo-deep-learning/blob/develop/tests/tiling/tiling_segmentation_binary_ci.csv>`_. 
 Each row of this csv is considered, in geo-deep-learning terms, to be an 
@@ -46,7 +45,22 @@ Dataset splits
 
 Split in csv should be either "trn", "tst" or "inference". The tiling script outputs lists of 
 patches for "trn", "val" and "tst" and these lists are used as is during training. 
-Its proportion is set by the :ref:`tiling config <datatiling>`.
+Its proportion is set by the :ref:`tiling config <datatiling>`.  
+
+AOI
+---
+An AOI is defined as an image (single imagery scene or mosaic), its content and metadata and the associated ground truth vector (optional).  
+
+.. note::
+    
+     AOI without ground truth vector can only be used for inference purposes.
+
+
+The AOI's implementation in the code is as follow:  
+
+.. autoclass:: dataset.aoi.AOI
+   :members:
+   :special-members:
 
 Raster and vector file compatibility
 ------------------------------------
@@ -66,15 +80,15 @@ Remote sensing is known to deal with raster files from a wide variety of formats
 To provide as much 
 flexibility as possible with variable input formats for raster data, geo-deep-learning supports:
 
-#. Multi-band raster files, to be used as is (all bands needed, all bands is expected order)
-#. Multi-band raster files with more bands than needed (e.g. Actual is "BGRN", needed is "BGR")
-#. Multi-band raster files with bands in different order than needed (e.g. Actual is "BGR", needed is "RGB")
-#. Single-band raster files, identified with a common string pattern (see details below)
-#. Single-band raster files, identified as assets in a stac item (see details below)
+#. :ref:`Multi-band raster files, used as is <datasetmultiband>` (all bands needed, all bands is in the expected order)
+#. :ref:`Multi-band raster files with more bands or different order than needed <datasetmultibandmorebands>` (e.g. Actual is "BGRN", needed is "BGR" OR Actual is "BGR", needed is "RGB")
+#. :ref:`Single-band raster files, identified with a common string pattern <datasetsingleband>` (see details below)
+#. :ref:`Single-band raster files, identified as assets in a stac item <datasetstacitem>` (see details below)
 
 To support these variable inputs, geo-deep-learning expects the first column of an input csv to be in the 
 following formats.
 
+.. _datasetmultiband:
 Multi-band raster files, used as is
 ====================================
 
@@ -87,7 +101,8 @@ This is the default and basic use.
      - ...
    * - my_dir/my_multiband_geofile.tif
      - ...
-   
+
+.. _datasetmultibandmorebands:
 Multi-band raster files with more bands or different order than needed 
 ======================================================================
 
@@ -116,7 +131,7 @@ The ``bands`` parameter is set in the
     indexed from 1 
     (`docs <https://rasterio.readthedocs.io/en/latest/quickstart.html#reading-raster-data>`_).
 
-
+.. _datasetsingleband:
 Single-band raster files, identified with a common string pattern
 =================================================================
 
diff --git a/docs/source/mode.rst b/docs/source/mode.rst
index 96bd2fe9..7fab5980 100755
--- a/docs/source/mode.rst
+++ b/docs/source/mode.rst
@@ -5,6 +5,11 @@ Mode
 
 The **mode** represent the assignment that you give to the code. 
 
+The following schema describes GDL's modes and concepts.  
+
+.. image:: ../img/overview.png
+   :width: 600
+
 .. _datatiling:
 
 Data Tiling
@@ -21,7 +26,7 @@ preparation phase creates `chips <https://torchgeo.readthedocs.io/en/latest/user
 For this tiling step, **GDL** requires a csv as input with a list of rasters and labels to be
 used in the subsequent training phase.
 This csv must have been specified as a ``path`` in the ``raw_data_csv`` from :ref:`configurationgeneralparam`.
-The other parameter will be found in :ref:`configurationdefaultparam` under ``tiling`` and 
+The other parameters will be found in :ref:`configurationdefaultparam` under ``tiling`` and 
 this configuration file looks like:
 
 .. literalinclude:: ../../../config/tiling/default_tiling.yaml
@@ -106,21 +111,56 @@ Training
    # Training the neural network
    (geo_deep_env) $ python GDL.py mode=train
 
-Training, along with validation and testing phase is where the neural network learns to use the data prepared in 
-the previous phase to make all the predictions. The crux of the learning process is the training phase.  
-During the training the data are separated in three for training, validation and test. The samples labeled "*trn*"
+Training, along with validation and testing phase is where the neural network learns, from the data prepared in 
+the tiling mode to make all the predictions. The crux of the learning process is the training phase.  
+During the training the data are separated in three datasets for training, validation and test. The samples labeled "*trn*"
 as per above are used to train the neural network. The samples labeled "*val*" are used to estimate the training
 error (i.e. loss) on a set of sub-images not used for training. After every epoch and at the end of all epochs, 
 the model with the lowest error on validation data is loaded and use on the samples labeled "*tst*" if they exist.
 The result of those "*tst*" images is used to estimate the accuracy of the model, since those images were 
 unseen during training nor validation.
 For all those steps, we have the parameters that can be found in :ref:`configurationdefaultparam` under ``training``
-and this configuration file look a like:
+and this configuration file looks like:
 
 .. literalinclude:: ../../../config/training/default_training.yaml
    :language: yaml
 
-This section will follow soon.
+
+- ``num_gpus`` (int)
+    Number of GPUs used for training. The value does not matter if Pytorch is installed cpu-only. 
+- ``batch_size`` (int)
+    Number of training tiles in one forward/backward pass.
+- ``eval_batch_size`` (int)
+    Number of validation tiles in one forward/backward pass.
+- ``batch_metrics`` (int)
+    Compute metrics every n batches. If set to 1, will calculate metrics for every batch during validation. Calculating 
+    metrics is time-consuming, therefore it is not always required to calculate it on every batch, for every epoch. 
+- ``lr`` (float) 
+    Learning rate at first epoch.
+- ``max_epochs`` (int)
+    Maximum number of epoch for one training session. 
+- ``min_epochs`` (int)
+    Minimum number of epoch for one training session.
+- ``num_workers`` (int, optional)
+    Number of workers assigned for the dataloader. If not provided, will be deduced from the number of GPU (num_workers = 4 * num_GPU). 
+    `References <https://discuss.pytorch.org/t/guidelines-for-assigning-num-workers-to-dataloader/813/5>`_
+- ``mode`` (str)
+   'min' or 'max', will minimize or maximize the chosen loss.
+- ``max_used_ram`` (int, optional)
+    Used to calculate wether or not the process can use the GPU. If a GPU is already used by another process, the training can still be 
+    pushed to this GPU if ``max_used_ram`` is not met. 
+- ``max_used_perc`` (int, optional)
+    Value between 0-100. Used to calculate wether or not the process can use the GPU. If a GPU is already used by another process, 
+    the training can still be pushed to this GPU if ``max_used_perc`` is not met. 
+- ``state_dict_path`` (str, optional)
+    Path to a pretrained model (.pth.tar).
+- ``state_dict_strict_load`` (bool, optional)
+    Defines whether to strictly enforce that the keys in state_dict match the keys returned by this Pytorch's state_dict() function. 
+    Default: True. `Reference <https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.load_state_dict>`_
+- ``compute_sampler_weights`` (bool, optional)
+    If provided, estimate sample weights by class for unbalanced datasets. 
+    Uses `Sk-learn <https://scikit-learn.org/stable/modules/generated/sklearn.utils.class_weight.compute_sample_weight.html>`_
+
 
 .. _inference:
 
@@ -135,7 +175,7 @@ Inference
 The inference phase is the last one, it allows the use of a trained model to predict on new input data without
 ground truth. For this final step in the process, it need to assign every pixel in the original image a value 
 corresponding to the most probable class with a certain level of confidence. Like the other two mode, the parameter 
-will be found in :ref:`configurationdefaultparam` under ``inference`` and this configuration file look a like 
+will be found in :ref:`configurationdefaultparam` under ``inference`` and this configuration file looks like 
 (for binary inference):
 
 .. literalinclude:: ../../../config/inference/default_binary.yaml
diff --git a/docs/source/quickstart.rst b/docs/source/quickstart.rst
index 5fd89624..649acb0c 100755
--- a/docs/source/quickstart.rst
+++ b/docs/source/quickstart.rst
@@ -21,7 +21,7 @@ Examples used here are for a bash shell in an Ubuntu GNU/Linux environment.
 
 Installation
 ------------
-Miniconda is suggested as the package manager for GDL. However, users are advised to `switch to libmamba <https://github.com/NRCan/geo-deep-learning#quickstart-with-conda>` as conda's default solver or to __directly use mamba__ instead of conda if they are facing extended installation time or other issues. Additional problems are grouped in the `troubleshooting section <https://github.com/NRCan/geo-deep-learning#troubleshooting>`. If issues persist, users are encouraged to open a new issue for assistance.
+Miniconda is suggested as the package manager for GDL. However, users are advised to `switch to libmamba <https://github.com/NRCan/geo-deep-learning#quickstart-with-conda>`_ as conda's default solver or to directly use mamba instead of conda if they are facing extended installation time or other issues. Additional problems are grouped in the :ref:`troubleshooting`. If issues persist, users are encouraged to open a new issue for assistance.
 
 Quickstart with conda
 
@@ -37,21 +37,24 @@ python environment with the following commands:
 
    Tested on Ubuntu 20.04, Windows 10 and WSL 2.
 
-Change conda's default solver for faster install (__Optional__)
+Change conda's default solver for faster install (Optional)
 
 .. code-block:: console
 
    $ conda install -n base conda-libmamba-solver
    $ conda config --set solver libmamba
 
-.. _troubleshooting
- Troubleshooting
-----------------
 
- .. code-block:: console
-   $ *ImportError: /lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.29' not found*
+.. _troubleshooting:
+ 
+Troubleshooting
+---------------
+Import error:  
 
 .. code-block:: console
+
+   $ *ImportError: /lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.29' not found*
+   $
    $ # Export path to library or set it permenantly in your .bashrc file (example with conda) :
    $ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/