Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove MLFlow framework #4

Closed
jatkinson1000 opened this issue Feb 16, 2023 · 6 comments · Fixed by #97
Closed

Remove MLFlow framework #4

jatkinson1000 opened this issue Feb 16, 2023 · 6 comments · Fixed by #97
Labels
enhancement New feature or request

Comments

@jatkinson1000
Copy link
Member

jatkinson1000 commented Feb 16, 2023

We probably don't want MLFlow for distribution?

Easy to remove the entry point.
Harder to deal with the way datasets are currently saved and then re-opened.

@jatkinson1000 jatkinson1000 added the enhancement New feature or request label Feb 16, 2023
@jatkinson1000
Copy link
Member Author

Perhaps provide documentation for how a user could build an MLFlow framework around the final package.

Discussion with @arthurBarthe Will need to change where processed datasets are saved in the code.

@arthurBarthe
Copy link
Collaborator

I agree that the MLflow component should not be compulsory. We might want to make it easy to interact with it, possibly through some additional API, but we can leave that for later.

@jatkinson1000
Copy link
Member Author

Files that deal with MLFlow:

  • cmip26.py - saves dataset in mlflow framework - need to save elsewhere.
  • trainscript.py - uses MLFlow to log run parameters. Can probably be removed easily, but consider logging elsewhere?
  • data/ - utils.py and readData.py both use mlflow to load data and log?
  • listdata.py - used to display MLFlow runs. Can probably be removed.
  • MLproject - defines entry points to scripts. Rewrite to json inputs and reduce argparsing?
  • analysis/ - contains some files that load mlflow data so would need refactoring.
  • mlruns - stores runs and associated data. Not currently part of repo as in .gitignore (also means various jupyter notebooks are unusable as data not there).
  • Various jupyter notebooks use mlflow to load data.

@mondus
Copy link
Contributor

mondus commented May 9, 2023

For a first release it is suggested that we leave MLFlow within the code and this can be refactored later if required.

@MarionBWeinzierl
Copy link
Collaborator

#85 and #95 are introducing a CLI for the data and training step, respectively. The inference step is still to be done, and the Jupyter notebooks have to also use those changes and don't yet (see #82).

The refactoring work is done/merged on the dev branch.

@raehik
Copy link
Contributor

raehik commented Dec 6, 2023

#97 removes MLflow integration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants