This document is intended to give users and contributors an idea of OpenFL product team's current priorities, features we plan to incorporate over the short, medium, and long term, and call out opportunities for the community to get involved.
At a minimum once each product release - which we expect to be on a cadence of every 3-4 months.
All interfaces in OpenFL support the standard horizontal FL training workflow today:
- The collaborator downloads the latest model from the aggregator
- The collaborator performs validation with their local validation dataset on the aggregated model, and sends these metrics to the aggregator (aggregated_model_validation task)
- The collaborator trains the model on their local training data set, and sends the local model weights and metrics to the aggregator (train task)
- The collaborator performs validation with their local validation dataset on their locally trained model, and sends their validation metrics to the aggregator (locally_tuned_model_validation task)
- The aggregator applies an aggregation function (weighted average, FedCurv, FedProx, etc.) to the model weights, and reports the aggregate metrics.
The Task Assigner determines the list of collaborator tasks to be performed,
and both in the task runner API as well as the interactive API these tasks can be modified (to varying degrees).
For example, to perform federated evaluation of a model, only the aggregated_model_validation
task would be listed for the assigner's block of the federated plan.
Equivalently for the interactive API, this can be done by only registering a single validation task.
But there are many other types of workflows that can't be easily represented purely by training / validation tasks performed on a collaborator with a single model.
An example is training a Federated Generative Adversarial Network (GAN); because this may be represented by separate generative and discriminator models, and could leak information about a collaborator dataset,
the interface we provide should allow for better control over what gets sent over the network and how.
Another common request we get is for validation with an aggregator's dataset after training. Today there is not a great way to enable this in OpenFL.
For these reasons, we are planning to add experimental support for complex distributed workflows in OpenFL 1.5, with the following goals:
In the process of thinking about federated workflows, and the properties that are important, these are our goals:
- Simplify the federated workflow representation
- Clean separation of workflow from runtime infrastructure
- Help users better understand the steps in federated learning (weight extraction, tensor compression, etc.)
- Interface makes it clear what is sent across the network
- The placement of tasks and how they connect should be straightforward
- Don't reinvent unless absolutely necessary
OpenFL is designed for security and privacy, and soon we will be releasing some exciting extensions that build on running OpenFL experiments within SGX enclaves.
The task runner interface is coupled with the the single experiment aggregator / collaborator infrastructure, and the interactive API is tied to the director / envoy infrastructure. The interactive API was originally designed to be a high-level API for OpenFL, but for the cases when more control is required by users, access to lower level interfaces is necessary.
Today we support three interfaces: TaskRunner, native Python API, and interactive API. These are all distinct APIs, and are not particularly interoperable. By the time we reach OpenFL 2.0, our intention is to deprecate the original native Python API used for simulations, bring consistency to the remaining interfaces with a high level, middle level, and low level API that are fully interoperable. This will result in being able to use the interface you're most comfortable with for a simulation, single experiment, or experiment session (with the director / envoy infrastructure).
Federated Learning is a burgeoning space. Most core FL infrastructure (model weight extraction, network protocols, and serialization designs) must be reimplemented ad hoc by each framework. This causes community fragmentation and distracts from some of the bigger problems to be solved in federated learning. In the short term, we want to collaborate on standards for FL, first at the communication and storage layer, and make these components modular across other frameworks. Our aim is also to provide a library for FL algorithms, compression methods, that can both be applied and interpreted easily.
- Use the OpenFL Workflow Interface on distributed infrastructure with the FederatedRuntime
- LLM Support
- New use cases enabled by custom workflows
- Standard ML Models (i.e. Tree-based algorithms)
- Federated evaluation documentation and examples
- Significantly improved documentation
- Interface Cohesion
- High level interface: Interactive API
- Low level interface: Workflow API
- Decoupling interfaces from infrastructure
- Well defined interfaces intended for building higher level projects on top of OpenFL