-
Notifications
You must be signed in to change notification settings - Fork 53
add gcp storage to xgboost-operator #81
base: master
Are you sure you want to change the base?
Conversation
Hi @xfate123. Thanks for your PR. I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
draft updated. Appreciate further review
config/samples/xgboost-dist/utils.py
Outdated
'feature_importance.json') | ||
|
||
gcp_path = gcp_parameters['path'] | ||
logger.info('---- export model ----') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
export model to GCP ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's to GCP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also update YAML, and the readme to help user to use as well.
fscore_dict = booster.get_fscore() | ||
with open(feature_importance, 'w') as file: | ||
file.write(json.dumps(fscore_dict)) | ||
logger.info('---- chief dump model successfully!') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dump model to local ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I learnt it from dump to oss module, I think the logic is dump the model to local first, and then upload from local to the cloud
upload_gcp(gcp_parameters, model_fname, aux_path) | ||
upload_gcp(gcp_parameters, text_model_fname, aux_path) | ||
upload_gcp(gcp_parameters, feature_importance, aux_path) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add the log to say that this model is updated success?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for sure
…_v1alpha1_iris_predict_oss.yaml
…tjob_v1alpha1_iris_predict_gcp.yaml
…ob_v1alpha1_iris_train_gcp.yaml
@merlintang update the README for user's convenience. And also specify the yaml for oss user and gcp user. Appreciate for further review |
@@ -41,15 +46,36 @@ For Eg: | |||
--oss_param=endpoint:http://oss-ap-south-1.aliyuncs.com,access_id:XXXXXXXXXXX,access_key:XXXXXXXXXXXXXXXXXXX,access_bucket:XXXXXX | |||
Similarly, xgboostjob_v1alpha1_iris_predict.yaml is used to configure XGBoost job batch prediction. | |||
|
|||
**Configure GCP parameter** | |||
For training jobs in GCP , you could configure xgboostjob_v1alpha1_iris_train.yaml and xgboostjob_v1alpha1_iris_predict.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the yaml file name is correct.
For training jobs in GCP , you could configure xgboostjob_v1alpha1_iris_train.yaml and xgboostjob_v1alpha1_iris_predict.yaml | ||
Note, we use [GCP](https://cloud.google.com/) to store the trained model, | ||
thus, you need to specify the GCP parameter in the yaml file. Therefore, remember to fill the GCP parameter in xgboostjob_v1alpha1_iris_train.yaml and xgboostjob_v1alpha1_iris_predict.yaml file. | ||
The oss parameter includes the account information such as type, client_id, client_email,private_key_id,private_key and access_bucket. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oss?
spec: | ||
containers: | ||
- name: xgboostjob | ||
image: docker.io/merlintang/xgboost-dist-iris:1.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the image name is not correct, you need to build the new image withe new code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for sure, thanks for your advice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just double check, you mean build a new image with new python code and update the new image to all yaml files in this folder.
Do I understand correct?
yeal
… On May 16, 2020, at 4:14 PM, xfate123 ***@***.***> wrote:
@xfate123 commented on this pull request.
In config/samples/xgboost-dist/xgboostjob_v1alpha1_iris_train_gcp.yaml:
> +apiVersion: "xgboostjob.kubeflow.org/v1alpha1"
+kind: "XGBoostJob"
+metadata:
+ name: "xgboost-dist-iris-test-train-gcp"
+spec:
+ xgbReplicaSpecs:
+ Master:
+ replicas: 1
+ restartPolicy: Never
+ template:
+ apiVersion: v1
+ kind: Pod
+ spec:
+ containers:
+ - name: xgboostjob
+ image: docker.io/merlintang/xgboost-dist-iris:1.1
just double check, you mean build a new image with new python code and update the new image to all yaml files in this folder.
Do I understand correct?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
@merlintang already created the new image and update it to all the yaml file. still need further testing. |
change the PR title, you still have the work in progress. |
- --job_type=Predict | ||
- --model_path=autoAI/xgb-opt/2 | ||
- --model_storage_type=gcp | ||
- --gcp_param=unknown |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why unknown here?
spec: | ||
containers: | ||
- name: xgboostjob | ||
image: docker.io/xfate123/xgboost-dist-iris:1.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a local image
spec: | ||
containers: | ||
- name: xgboostjob | ||
image: docker.io/xfate123/xgboost-dist-iris:1.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
imagePullPolicy: Always | ||
args: | ||
- --job_type=Predict | ||
- --model_path=autoAI/xgb-opt/2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we simplify the model path?
@@ -42,7 +42,7 @@ spec: | |||
claimName: xgboostlocal | |||
containers: | |||
- name: xgboostjob | |||
image: docker.io/merlintang/xgboost-dist-iris:1.1 | |||
image: docker.io/xfate123/xgboost-dist-iris:1.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a Dockerfile for this image in this repo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only have image in this repo
Think about adding a another storage option for our xgboost-operator. Still working on it.