Skip to content

Commit

Permalink
Add FL plan description to documentation (securefederatedai#872)
Browse files Browse the repository at this point in the history
* Add plan description to documentation

Signed-off-by: Mansi Sharma <[email protected]>

* fix indentation

Signed-off-by: Mansi Sharma <[email protected]>

* Apply suggestions from code review

Co-authored-by: Patrick Foley <[email protected]>

---------

Signed-off-by: Mansi Sharma <[email protected]>
Co-authored-by: Patrick Foley <[email protected]>
  • Loading branch information
mansishr and psfoley committed Oct 18, 2023
1 parent fb0e81d commit 982008f
Showing 1 changed file with 40 additions and 14 deletions.
54 changes: 40 additions & 14 deletions docs/running_the_federation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -720,39 +720,65 @@ Federated Learning Plan (FL Plan) Settings
Use the Federated Learning plan (FL plan) to modify the federation workspace to your requirements in an **aggregator-based workflow**.


In order for participants to agree to take part in an experiment, everyone should know ahead of time both what code is going to run on their infrastructure and exactly what information on their system will be accessed. The federated learning (FL) plan aims to capture all of this information needed to decide whether to participate in an experiment, in addition to runtime details needed to load the code and make remote connections.
The FL plan is described by the **plan.yaml** file located in the **plan** directory of the workspace.


Each YAML top-level section contains the following subsections:

- ``template``: The name of the class including top-level packages names. An instance of this class is created when the plan gets initialized.
- ``settings``: The arguments that are passed to the class constructor.
- ``defaults``: The file that contains default settings for this subsection.
Any setting from defaults file can be overridden in the **plan.yaml** file.

The following is an example of a **plan.yaml**:

.. literalinclude:: ../openfl-workspace/torch_cnn_mnist/plan/plan.yaml
:language: yaml


Configurable Settings
^^^^^^^^^^^^^^^^^^^^^

- :class:`Aggregator <openfl.component.Aggregator>`
`openfl.component.Aggregator <https://github.com/intel/openfl/blob/develop/openfl/component/aggregator/aggregator.py>`_
Defines the settings for the aggregator which is the model-owner in the experiment. While models can be trained from scratch, in many cases the federation performs fine-tuning of a previously trained model. For this reason, pre-trained weights for the model are stored in protobuf files on the aggregator node and passed to collaborator nodes during initialization. The settings for aggregator include:

- :code:`init_state_path`: (str:path) Defines the weight protobuf file path where the experiment's initial weights will be loaded from. These weights will be generated with the `fx plan initialize` command.
- :code:`best_state_path`: (str:path) Defines the weight protobuf file path that will be saved to for the highest accuracy model during the experiment.
- :code:`last_state_path`: (str:path) Defines the weight protobuf file path that will be saved to during the last round completed in each experiment.
- :code:`rounds_to_train`: (int) Specifies the number of rounds in a federation. A federated learning round is defined as one complete iteration when the collaborators train the model and send the updated model weights back to the aggregator to form a new global model. Within a round, collaborators can train the model for multiple iterations called epochs.
- :code:`write_logs`: (boolean) Metric logging callback feature. By default, logging is done through `tensorboard <https://www.tensorflow.org/tensorboard/get_started>`_ but users can also use custom metric logging function for each task.


- :class:`Collaborator <openfl.component.Collaborator>`
`openfl.component.Collaborator <https://github.com/intel/openfl/blob/develop/openfl/component/collaborator/collaborator.py>`_
Defines the settings for the collaborator which is the data owner in the experiment. The settings for collaborator include:

- :code:`delta_updates`: (boolean) Determines whether the difference in model weights between the current and previous round will be sent (True), or if whole checkpoints will be sent (False). Setting to delta_updates to True leads to higher sparsity in model weights sent across, which may improve compression ratios.
- :code:`opt_treatment`: (str) Defines the optimizer state treatment policy. Valid options are : 'RESET' - reinitialize optimizer for every round (default), 'CONTINUE_LOCAL' - keep local optimizer state for every round, 'CONTINUE_GLOBAL' - aggregate optimizer state for every round.


- :class:`Data Loader <openfl.federated.data.loader.DataLoader>`
`openfl.federated.data.loader.DataLoader <https://github.com/intel/openfl/blob/develop/openfl/federated/data/loader.py>`_
Defines the data loader class that provides access to local dataset. It implements a train loader and a validation loader that takes in the train dataset and the validation dataset respectively. The settings for the dataloader include:

- :code:`collaborator_count`: (int) The number of collaborators participating in the federation
- :code:`data_group_name`: (str) The name of the dataset
- :code:`batch_size`: (int) The size of the training or validation batch


- :class:`Task Runner <openfl.federated.task.runner.TaskRunner>`
`openfl.federated.task.runner.TaskRunner <https://github.com/intel/openfl/blob/develop/openfl/federated/task/runner.py>`_
Defines the model, training/validation functions, and how to extract and set the tensors from model weights and optimizer dictionary. Depending on different AI frameworks like PyTorch and Tensorflow, users can select pre-defined task runner methods.


- :class:`Assigner <openfl.component.Assigner>`
`openfl.component.Assigner <https://github.com/intel/openfl/blob/develop/openfl/component/assigner/assigner.py>`_
Defines the task that are sent to the collaborators from the aggregator. There are three default tasks that could be given to each Collaborator:

- :code:`aggregated_model_validation`: (str) Perform validation on aggregated global model sent by the aggregator.
- :code:`train`: (str) Perform training on the global model.
- :code:`locally_tuned_model_validation`: (str) Perform validation on the model that was locally trained by the collaborator.


Each YAML top-level section contains the following subsections:

- ``template``: The name of the class including top-level packages names. An instance of this class is created when the plan gets initialized.
- ``settings``: The arguments that are passed to the class constructor.
- ``defaults``: The file that contains default settings for this subsection.
Any setting from defaults file can be overridden in the **plan.yaml** file.

The following is an example of a **plan.yaml**:

.. literalinclude:: ../openfl-workspace/torch_cnn_mnist/plan/plan.yaml
:language: yaml


Tasks
Expand Down

0 comments on commit 982008f

Please sign in to comment.