Below schema should be used when preparing a config.yaml
file for models using the tool. Some parameters are optional and marked as such.
project_id: [project ID] bucket_id: [GCS bucket ID] region: [GCP region to train ML Pipeline Generator models in, on AI Platform] cluster_name: [Name of GKE cluster hosting Kubeflow Pipelines] cluster_zone: [Zone in which GKE cluster is deployed] scale_tier: [compute specifications for training the model on AI Platform] runtime_version: [AI Platform Training runtime version] python_version: [Python version used in the model code for training] package_name: [name for the source distribution to be uploaded to GCS] machine_type_pred: [type of virtual machine that AI Platform Prediction uses for the nodes that serve predictions, defaults to mls1-c1-m2] data: schema: - [schema for input & target features in the training data] train: [GCS location url to upload preprocessed training data] evaluation: [GCS location url to upload preprocessed eval data] prediction: input_data_paths: - [GCS location urls for prediction input data] input_format: [prediction input format] output_format: [prediction output format] model: name: [unique model name, must start with a letter and only contain letters, numbers, and underscores] path: [local dir path to the model.py file] target: [target feature in training data] metrics: [metrics to evaluate model training on, such as “accuracy”] model_params: input_args: [Any input params to be submitted with the job] arg_name: type: [data type of the arg, such as int] help: [short description of the arg] default: [default value of the arg] hyperparam_config: [optional; local path to hyperparam tuning config yaml. See schema here for this config file.] explanation: [optional; explainability features for the training job] orchestration: kubeflow_url: [for KFP backend; URL of preconfigured Kubeflow instance]