[中文]
To make the modeling task more flexible, currently, FATE use its own domain-specific language(DSL) to describe modeling task. With usage of this DSL, modeling components such as data-io, feature-engineering and classification/regression module etc. can be combined as a Directed Acyclic Graph(DAG). Therefore, user can take and combine the algorithm components flexibly according to their needs.
In addition, each component has their own parameters to be configured. Also, the configuration may differ from party to party. For convenience, FATE configure all parameters for all parties and all components in one file. This guide will show you how to create such a configure file.
We use json file which is actually a dict as a dsl config file. The first level of the dict is always "components," which indicates content in the dict are components in your modeling task.
{ "components" : { ... } }
Then each component should be defined on the second level. Here is an example of setting a component:
"dataio_0": { "module": "DataIO", "input": { "data": { "data": [ "args.train_data" ] } }, "output": { "data": ["train"], "model": ["dataio"] }, "need_deploy": true }
As the example shows, user define the component name as key of this module. Note this module name should end up with a "_num" where the num should start with 0.
module: | Specify which component to use. This field should strictly match the file name in federatedml/conf/setting_conf except the .json suffix. |
---|---|
input: | There are two types of input, data and model.
|
Besides the dsl conf, user also need to prepare a submit runtime conf to set the parameters of each component.
initiator: | To begin with, the initiator should be specified in this runtime conf. Here is an exmaple of setting initiator: "initiator": { "role": "guest", "party_id": 10000 } |
---|---|
role: | All the roles involved in this modeling task should be specified. Each element in the role should contain role name and their party ids. The reason for ids are with form of list is that there may exist multiple parties in one role. "role": { "guest": [ 10000 ], "host": [ 10000 ], "arbiter": [ 10000 ] } |
role_parameters: | Those parameters that are differ from party to party, should be indicated here. Please note that each parameters should has the form of list. Inside the role_parameters, party names are used as key and parameters of these parties are values. Take the following structure as an example: "guest": { "args": { "data": { "train_data": [ { "name": "1ca0d9eea77e11e9a84f5254005e961b", "namespace": "arbiter-10000#guest-10000#host-10000#train_input#guest#10000" } ] } }, "dataio_0": { "with_label": [ true ], ... } }, "host": { "args": { "data": { "train_data": [ { "name": "3de22bdaa77e11e99c5d5254005e961b", "namespace": "arbiter-10000#guest-10000#host-10000#train_input#host#10000" } ] } }, "dataio_0": { ... } ... } As this example shows, for each party, the input parameters such as train_data, validate_data and so on should be list in args. The name and namespace above are table indicators for uploaded data. Then, user can config parameters for each components. The component names should match names defined in the dsl config file. The content of each component parameters are defined in Param class located in federatedml/param. |
algorithm_parameters: | If some parameters are the same among all parties, they can be set in algorithm_parameters. Here is an example showing how to do that. "hetero_feature_binning_0": { ... }, "hetero_feature_selection_0": { ... }, "hetero_lr_0": { "penalty": "L2", "optimizer": "rmsprop", "eps": 1e-5, "alpha": 0.01, "max_iter": 10, "converge_func": "diff", "batch_size": 320, "learning_rate": 0.15, "init_param": { "init_method": "random_uniform" }, "cv_param": { "n_splits": 5, "shuffle": false, "random_seed": 103, "need_cv": false, } } Same with the form in role parameters, each key of the parameters are names of components that are defined in dsl config file. |
After setting config files and submitting the task, fate-flow will combine the parameter list in role-parameters and algorithm parameters. If there are still some undefined fields, values in default runtime conf will be used. Then fate-flow will send these config files to their corresponding parties and start the federated modeling task.
For multi-host modeling case, all the host's party ids should be list in the role field.
"role": { "guest": [ 10000 ], "host": [ 10000, 10001, 10002 ], "arbiter": [ 10000 ] }
Each parameter set for host should also be list in a list. The number of elements should match the number of hosts.
"host": { "args": { "data": { "train_data": [ { "name": "hetero_breast_host_1", "namespace": "hetero_breast_host" }, { "name": "hetero_breast_host_2", "namespace": "hetero_breast_host" }, { "name": "hetero_breast_host_3", "namespace": "hetero_breast_host" } ] } }, "dataio_0": { "with_label": [false, false, false], "output_format": ["dense", "dense", "dense"], "outlier_replace": [true, true, true] }
The parameters set in algorithm parameters need not be copied into host role parameters. Algorithm parameters will be copied for every party.