Skip to content
This repository has been archived by the owner on Jan 31, 2022. It is now read-only.

Create a notebook to train a model using AutoML #130

Merged
merged 1 commit into from
May 1, 2020

Conversation

jlewi
Copy link
Contributor

@jlewi jlewi commented May 1, 2020

  • Refactor some of the notebook setup into notebook_setup.py to make it reusable

  • For the kpt package to launch a notebook
    Add kpt setters to properly set most values.
    Add an application resource so that links show up in the application dashboard.

  • Related to:
    Triage Action Should Use GitHub App (not a personal access token) #112 Train an org wide model
    Increase area label predictions to 25% of issues #121 Increase label predictions to 25%

  • Qualitatively the AutoML model trained on all issues with either an area or platform label seems to do much better than our current model. Or an MLP trained on all repositories with the new embeddings.

  • The new model includes the repo name as a feature (we just add it to the document). So
    its possible its the addition of that feature and not the model itself that accounts for the improved
    performance.

  • Also the model is only training on issues with a platform or area label as opposed to all issues.
    This should help us distinguish unlabeled examples from negative examples.
    e.g. if label area/jupyter is missing from an issue that could be because the label doesn't
    apply or because the issue was never labeled. If an issue has one area or platform label
    then it was likely added by a human which increases the likelihood that any missing area or platform labels are missing because they don't apply.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

You'll be able to see Jupyter notebook diff and discuss changes. Powered by ReviewNB.

@kubeflow-bot
Copy link

This change is Reviewable

* Refactor some of the notebook setup into notebook_setup.py to make it reusable

* For the kpt package to launch a notebook
  Add kpt setters to properly set most values.
  Add an application resource so that links show up in the application dashboard.
@jlewi
Copy link
Contributor Author

jlewi commented May 1, 2020

/assign @hamelsmu

@hamelsmu
Copy link
Member

hamelsmu commented May 1, 2020

/approve
/lgtm

cc @T-Holland @gregce @inc0 this is a real example of someone utilizing AutoML in the wild ( also available on Azure and DataRobot)

Allows someone to focus on productionizing something off the bat with an extremely strong baseline (that is often hard to beat in this case beats our model)

This code uses ML to automatically label issues

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hamelsmu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 09bc395 into kubeflow:master May 1, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants