Create a notebook to train a model using AutoML #130

jlewi · 2020-05-01T20:59:17Z

Refactor some of the notebook setup into notebook_setup.py to make it reusable
For the kpt package to launch a notebook
Add kpt setters to properly set most values.
Add an application resource so that links show up in the application dashboard.
Related to:
Triage Action Should Use GitHub App (not a personal access token) #112 Train an org wide model
Increase area label predictions to 25% of issues #121 Increase label predictions to 25%
Qualitatively the AutoML model trained on all issues with either an area or platform label seems to do much better than our current model. Or an MLP trained on all repositories with the new embeddings.
The new model includes the repo name as a feature (we just add it to the document). So
its possible its the addition of that feature and not the model itself that accounts for the improved
performance.
Also the model is only training on issues with a platform or area label as opposed to all issues.
This should help us distinguish unlabeled examples from negative examples.
e.g. if label area/jupyter is missing from an issue that could be because the label doesn't
apply or because the issue was never labeled. If an issue has one area or platform label
then it was likely added by a human which increases the likelihood that any missing area or platform labels are missing because they don't apply.

review-notebook-app · 2020-05-01T20:59:23Z

Check out this pull request on

You'll be able to see Jupyter notebook diff and discuss changes. Powered by ReviewNB.

kubeflow-bot · 2020-05-01T20:59:23Z

This change is

* Refactor some of the notebook setup into notebook_setup.py to make it reusable * For the kpt package to launch a notebook Add kpt setters to properly set most values. Add an application resource so that links show up in the application dashboard.

jlewi · 2020-05-01T21:04:04Z

/assign @hamelsmu

hamelsmu · 2020-05-01T23:29:18Z

/approve
/lgtm

cc @T-Holland @gregce @inc0 this is a real example of someone utilizing AutoML in the wild ( also available on Azure and DataRobot)

Allows someone to focus on productionizing something off the bat with an extremely strong baseline (that is often hard to beat in this case beats our model)

This code uses ML to automatically label issues

k8s-ci-robot · 2020-05-01T23:29:26Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hamelsmu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [hamelsmu]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

googlebot added the cla: yes label May 1, 2020

k8s-ci-robot requested review from hamelsmu and inc0 May 1, 2020 20:59

k8s-ci-robot added the size/XXL label May 1, 2020

jlewi force-pushed the kf_embeddings branch from c3e6c5c to ba7a52c Compare May 1, 2020 21:02

k8s-ci-robot assigned hamelsmu May 1, 2020

k8s-ci-robot added the lgtm label May 1, 2020

k8s-ci-robot added the approved label May 1, 2020

k8s-ci-robot merged commit 09bc395 into kubeflow:master May 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a notebook to train a model using AutoML #130

Create a notebook to train a model using AutoML #130

jlewi commented May 1, 2020 •

edited

Loading

review-notebook-app bot commented May 1, 2020

kubeflow-bot commented May 1, 2020

jlewi commented May 1, 2020

hamelsmu commented May 1, 2020

k8s-ci-robot commented May 1, 2020

Create a notebook to train a model using AutoML #130

Create a notebook to train a model using AutoML #130

Conversation

jlewi commented May 1, 2020 • edited Loading

review-notebook-app bot commented May 1, 2020

kubeflow-bot commented May 1, 2020

jlewi commented May 1, 2020

hamelsmu commented May 1, 2020

k8s-ci-robot commented May 1, 2020

jlewi commented May 1, 2020 •

edited

Loading