From f5c970137529393eed11452bbf51b9aaa987e6be Mon Sep 17 00:00:00 2001 From: Mirko Bronzi Date: Fri, 20 Aug 2021 14:30:37 -0400 Subject: [PATCH] release 2.1 (#41) * feat: add option to specify a temporary folder for the experiment. (#5) * added option to rsync input and output data * added docstring * logging to stdout now * fixed script for clusters - now using slurm tmpdir to write temp results * fixing travis * added missing docstring * fixed tensorflow part (method signature change) * renamed variables * Seed for reproducibility (#6) * added option to rsync input and output data * added docstring * logging to stdout now * fixed script for clusters - now using slurm tmpdir to write temp results * fixing travis * added missing docstring * fixed tensorflow part (method signature change) * added seed for pytorch * fixed typo * added comment on how to use seed * fixed flake8 * added test on reproducibility * removed pytorch part from tensorflow * fixed cookiecutter syntax * added check for tensorflow * fixed typo in test file * added command to set the seed in tensorflow * fixed flake8 error * fixed typos * removed duplicate log * typo in docstring * better error message in test * added test to check repro using Orion (#8) * added test to check repro using Orion * more log into travis * more info to debug travis * running two trials for orion * added seed to orion * added orion test to tensorflow part * better log messages in travis * Add support for keras and Pytorch Lighning (#12) * added code for keras - still need to complete all tests * fixed flake8 * started adding PyTorch Lightning support - note that mlflow and loading/saving model still does not work * fixed api change * fixed pytorch early stopping * fixed flake8 * fixed flake8 for pytorch version * fixed keras part for flake8 * added code to resume a model - for pytorch lightning * removed forgotten diff * fixed start_from_scratch (not loading a model even if present) / now printing the val loss in the logs * pytorch lightning now correctly logging under the same run * now pytorch is correctly resuming training and continues to plot in the same mlflow run * added github actions * using a different ubuntu image * printing folder - trying to fix github actions * telling git who I am.. * removed not useful test * fixed typo in test folder * removed travis configuration - using github actions from now on * correctly handling the saved models in pytorch * now passing the full hyper-parameter object to train_impl method (for more flexibility). * added option to ask for gpus in pytorch * improved error message * Fixups for the lightning_and_keras PR (#12) (#22) * Update torch model to pl-lightning model * Refactor train+model impl w/ optim module * Refactor data loader w/ data module for plightning * removing codecov from cookeicutter. (#24) * moving to github actions (#25) * removing coverage computation * moving from travis to gitbug actions. * setting fake name/email for git. * removed (not-correct) duplicate for github actions config file. * fixing tests. * refactored pytorch models. (#26) Co-authored-by: Mirko Bronzi * running CI also on develop. Co-authored-by: Pierre-Luc St-Charles * Adding more CI backends. (#27) * added github actions. * moved python version to 3.9 - by default. * added support for azure continuous integration. * updated mlflox/orion dependencies. * now correctly restoring models for pytorch. (#28) * Now running test-coverage locally. (#30) * running test coverage locally. * fixed project name. * correctly allowing mlflow to work in any folder. (#29) * removed duplicate CI. * Update cookiecutter doc url (#37) * made the template generic by default - will add mila-specific aspects only if enabled at template instantiation time (#38) * default branch is now main (#39) * made the template generic by default - will add mila-specific aspects only if enabled at template instantiation time * now using main as the default branch for github * Fixed typo Co-authored-by: Pierre-Luc St-Charles Co-authored-by: Mathieu Germain --- README.md | 2 +- cookiecutter.json | 1 + .../.github/workflows/tests.yml | 4 +-- {{cookiecutter.project_slug}}/README.md | 29 ++++++++++--------- .../examples/{slurm_cc => slurm}/config.yaml | 0 .../examples/{slurm_cc => slurm}/run.sh | 0 .../{slurm_mila => slurm}/to_submit.sh | 13 ++++++++- .../examples/slurm_cc/to_submit.sh | 16 ---------- .../examples/slurm_cc_orion/to_submit.sh | 23 --------------- .../examples/slurm_mila/config.yaml | 14 --------- .../examples/slurm_mila/run.sh | 2 -- .../examples/slurm_mila_orion/config.yaml | 14 --------- .../slurm_mila_orion/orion_config.yaml | 16 ---------- .../examples/slurm_mila_orion/run.sh | 2 -- .../config.yaml | 0 .../orion_config.yaml | 0 .../{slurm_cc_orion => slurm_orion}/run.sh | 0 .../to_submit.sh | 17 +++++++---- 18 files changed, 43 insertions(+), 110 deletions(-) rename {{cookiecutter.project_slug}}/examples/{slurm_cc => slurm}/config.yaml (100%) rename {{cookiecutter.project_slug}}/examples/{slurm_cc => slurm}/run.sh (100%) rename {{cookiecutter.project_slug}}/examples/{slurm_mila => slurm}/to_submit.sh (55%) delete mode 100644 {{cookiecutter.project_slug}}/examples/slurm_cc/to_submit.sh delete mode 100644 {{cookiecutter.project_slug}}/examples/slurm_cc_orion/to_submit.sh delete mode 100644 {{cookiecutter.project_slug}}/examples/slurm_mila/config.yaml delete mode 100644 {{cookiecutter.project_slug}}/examples/slurm_mila/run.sh delete mode 100644 {{cookiecutter.project_slug}}/examples/slurm_mila_orion/config.yaml delete mode 100644 {{cookiecutter.project_slug}}/examples/slurm_mila_orion/orion_config.yaml delete mode 100644 {{cookiecutter.project_slug}}/examples/slurm_mila_orion/run.sh rename {{cookiecutter.project_slug}}/examples/{slurm_cc_orion => slurm_orion}/config.yaml (100%) rename {{cookiecutter.project_slug}}/examples/{slurm_cc_orion => slurm_orion}/orion_config.yaml (100%) rename {{cookiecutter.project_slug}}/examples/{slurm_cc_orion => slurm_orion}/run.sh (100%) rename {{cookiecutter.project_slug}}/examples/{slurm_mila_orion => slurm_orion}/to_submit.sh (70%) diff --git a/README.md b/README.md index 2674d52..b3b6e0f 100644 --- a/README.md +++ b/README.md @@ -13,7 +13,7 @@ A cookiecutter is a generic project template that will instantiate a new project * Flake8 * Pytest -More information on what a cookiecutter is [here.](https://cookiecutter.readthedocs.io/en/) +More information on what a cookiecutter is [here.](https://cookiecutter.readthedocs.io) Quickstart ---------- diff --git a/cookiecutter.json b/cookiecutter.json index 3e6909c..67f79ec 100644 --- a/cookiecutter.json +++ b/cookiecutter.json @@ -7,6 +7,7 @@ "project_short_description": "{{ cookiecutter.project_name }} is wonderful!", "python_version": "3.8", "dl_framework": ["pytorch", "tensorflow_cpu", "tensorflow_gpu"], + "environment": ["generic", "mila"], "pypi_username": "{{ cookiecutter.github_username }}", "version": "0.0.1", "open_source_license": ["MIT license", "BSD license", "ISC license", "Apache Software License 2.0", "GNU General Public License v3", "Not open source"] diff --git a/{{cookiecutter.project_slug}}/.github/workflows/tests.yml b/{{cookiecutter.project_slug}}/.github/workflows/tests.yml index 7f65188..6800e41 100644 --- a/{{cookiecutter.project_slug}}/.github/workflows/tests.yml +++ b/{{cookiecutter.project_slug}}/.github/workflows/tests.yml @@ -4,11 +4,11 @@ on: # but only for the main/develop branch push: branches: - - master + - main - develop pull_request: branches: - - master + - main - develop jobs: build: diff --git a/{{cookiecutter.project_slug}}/README.md b/{{cookiecutter.project_slug}}/README.md index 4180275..9a594a6 100644 --- a/{{cookiecutter.project_slug}}/README.md +++ b/{{cookiecutter.project_slug}}/README.md @@ -1,5 +1,3 @@ -[![Build Status](https://travis-ci.com/{{ cookiecutter.github_username }}/{{ cookiecutter.project_slug }}.png?branch=master)](https://travis-ci.com/{{ cookiecutter.github_username }}/{{ cookiecutter.project_slug }}) - {% set is_open_source = cookiecutter.open_source_license != 'Not open source' -%} # {{ cookiecutter.project_name }} @@ -46,9 +44,12 @@ These hooks will: Go on github and follow the instructions to create a new project. When done, do not add any file, and follow the instructions to link your local git to the remote project, which should look like this: +(PS: these instructions are reported here for your convenience. +We suggest to also look at the GitHub project page for more up-to-date info) git remote add origin git@github.com:{{ cookiecutter.github_username }}/{{ cookiecutter.project_slug }}.git - git push -u origin master + git branch -M main + git push -u origin main ### Setup Continuous Integration @@ -66,7 +67,7 @@ Check the following instructions for more details. Github actions are already configured in `.github/workflows/tests.yml`. Github actions are already enabled by default when using Github, so, when pushing to github, they will be executed automatically for pull requests to -`master` and to `develop`. +`main` and to `develop`. #### Travis @@ -120,12 +121,10 @@ Note you have two new folders now: You can run mlflow from this folder (`examples/local`) by running `mlflow ui`. -#### Run on the Mila cluster -(NOTE: this example also apply to Compute Canada - use the folders -`slurm_cc` and `slurm_cc_orion` instead of `slurm_mila` and `slurm_mila_orion`.) +#### Run on a remote cluster (with Slurm) -First, bring you project on the Mila cluster (assuming you didn't create your -project directly there). To do so, simply login on the Mila cluster and git +First, bring you project on the cluster (assuming you didn't create your +project directly there). To do so, simply login on the cluster and git clone your project: git clone git@github.com:{{ cookiecutter.github_username }}/{{ cookiecutter.project_slug }}.git @@ -135,12 +134,13 @@ Then activate your virtual env, and install the dependencies: cd {{ cookiecutter.project_slug }} pip install -e . -To run with SLURM, just: +To run with Slurm, just: - cd examples/slurm_mila + cd examples/slurm sh run.sh Check the log to see that you got an almost perfect loss (i.e., 0). +{%- if cookiecutter.environment == 'mila' %} #### Measure GPU time (and others) on the Mila cluster @@ -184,11 +184,12 @@ In a separate shell on your local computer, run the following command: where `` is your user name on the Mila cluster and `` is the name of the machine your job is currenty running on (`leto35` in our example). You can then navigate your local browser to `http://localhost:19999/` to view the ressources being used on the cluster and monitor your job. You should see something like this: ![image](https://user-images.githubusercontent.com/18450628/88088807-fe2acd80-cb58-11ea-8ab2-bd090e8a826c.png) +{%- endif %} -#### Run with Orion on the Mila cluster +#### Run with Orion on the Slurm cluster This example will run orion for 2 trials (see the orion config file). -To do so, go into `examples/slurm_mila_orion`. +To do so, go into `examples/slurm_orion`. Here you can find the orion config file (`orion_config.yaml`), as well as the config file (`config.yaml`) for your project (that contains the hyper-parameters). @@ -204,7 +205,7 @@ Inside these folders, you can find the models (the best one and the last one), t the hyper-parameters for this trial, and the log file. You can check orion status with the following commands: -(to be run from `examples/slurm_mila_orion`) +(to be run from `examples/slurm_orion`) export ORION_DB_ADDRESS='orion_db.pkl' export ORION_DB_TYPE='pickleddb' diff --git a/{{cookiecutter.project_slug}}/examples/slurm_cc/config.yaml b/{{cookiecutter.project_slug}}/examples/slurm/config.yaml similarity index 100% rename from {{cookiecutter.project_slug}}/examples/slurm_cc/config.yaml rename to {{cookiecutter.project_slug}}/examples/slurm/config.yaml diff --git a/{{cookiecutter.project_slug}}/examples/slurm_cc/run.sh b/{{cookiecutter.project_slug}}/examples/slurm/run.sh similarity index 100% rename from {{cookiecutter.project_slug}}/examples/slurm_cc/run.sh rename to {{cookiecutter.project_slug}}/examples/slurm/run.sh diff --git a/{{cookiecutter.project_slug}}/examples/slurm_mila/to_submit.sh b/{{cookiecutter.project_slug}}/examples/slurm/to_submit.sh similarity index 55% rename from {{cookiecutter.project_slug}}/examples/slurm_mila/to_submit.sh rename to {{cookiecutter.project_slug}}/examples/slurm/to_submit.sh index fed5b77..c87cc12 100644 --- a/{{cookiecutter.project_slug}}/examples/slurm_mila/to_submit.sh +++ b/{{cookiecutter.project_slug}}/examples/slurm/to_submit.sh @@ -1,5 +1,16 @@ #!/bin/bash -#SBATCH --partition=long +{%- if cookiecutter.environment == 'mila' %} +## this is for the mila cluster (uncomment it if you need it): +##SBATCH --account=rrg-bengioy-ad +## this instead for ComputCanada (uncomment it if you need it): +##SBATCH --partition=long +# to attach a tag to your run (e.g., used to track the GPU time) +# uncomment the following line and add replace `my_tag` with the proper tag: +##SBATCH --wckey=my_tag +{%- endif %} +{%- if cookiecutter.environment == 'generic' %} +## set --account=... or --partition=... as needed. +{%- endif %} #SBATCH --cpus-per-task=2 #SBATCH --gres=gpu:1 #SBATCH --mem=5G diff --git a/{{cookiecutter.project_slug}}/examples/slurm_cc/to_submit.sh b/{{cookiecutter.project_slug}}/examples/slurm_cc/to_submit.sh deleted file mode 100644 index 5e33447..0000000 --- a/{{cookiecutter.project_slug}}/examples/slurm_cc/to_submit.sh +++ /dev/null @@ -1,16 +0,0 @@ -#!/bin/bash -#SBATCH --account=rrg-bengioy-ad -#SBATCH --cpus-per-task=2 -#SBATCH --gres=gpu:1 -#SBATCH --mem=5G -#SBATCH --time=0:05:00 -#SBATCH --job-name={{ cookiecutter.project_slug }} -#SBATCH --output=logs/%x__%j.out -#SBATCH --error=logs/%x__%j.err -# remove one # if you prefer receiving emails -##SBATCH --mail-type=all -##SBATCH --mail-user={{ cookiecutter.email }} - -export MLFLOW_TRACKING_URI='mlruns' - -main --data ../data --output output --config config.yaml --tmp-folder ${SLURM_TMPDIR} --disable-progressbar diff --git a/{{cookiecutter.project_slug}}/examples/slurm_cc_orion/to_submit.sh b/{{cookiecutter.project_slug}}/examples/slurm_cc_orion/to_submit.sh deleted file mode 100644 index 209fecb..0000000 --- a/{{cookiecutter.project_slug}}/examples/slurm_cc_orion/to_submit.sh +++ /dev/null @@ -1,23 +0,0 @@ -#!/bin/bash -# __TODO__ fix options if needed -#SBATCH --job-name={{ cookiecutter.project_slug }} -#SBATCH --account=rrg-bengioy-ad -#SBATCH --cpus-per-task=2 -#SBATCH --gres=gpu:1 -#SBATCH --mem=5G -#SBATCH --time=0:05:00 -#SBATCH --output=logs/%x__%j.out -#SBATCH --error=logs/%x__%j.err -# remove one # if you prefer receiving emails -##SBATCH --mail-type=all -##SBATCH --mail-user={{ cookiecutter.email }} - -export MLFLOW_TRACKING_URI='mlruns' -export ORION_DB_ADDRESS='orion_db.pkl' -export ORION_DB_TYPE='pickleddb' - -orion -v hunt --config orion_config.yaml \ - main --data ../data --config config.yaml --disable-progressbar \ - --output '{exp.working_dir}/{exp.name}_{trial.id}/' \ - --log '{exp.working_dir}/{exp.name}_{trial.id}/exp.log' \ - --tmp-folder ${SLURM_TMPDIR} diff --git a/{{cookiecutter.project_slug}}/examples/slurm_mila/config.yaml b/{{cookiecutter.project_slug}}/examples/slurm_mila/config.yaml deleted file mode 100644 index 2e58acc..0000000 --- a/{{cookiecutter.project_slug}}/examples/slurm_mila/config.yaml +++ /dev/null @@ -1,14 +0,0 @@ -# general -batch_size: 32 -optimizer: adam -loss: L1 -patience: 5 -architecture: my_model -max_epoch: 99 -exp_name: my_exp_1 -# set to null to avoid setting a seed (can speed up GPU computation, but -# results will not be reproducible) -seed: 1234 - -# architecture -size: 10 diff --git a/{{cookiecutter.project_slug}}/examples/slurm_mila/run.sh b/{{cookiecutter.project_slug}}/examples/slurm_mila/run.sh deleted file mode 100644 index 9370362..0000000 --- a/{{cookiecutter.project_slug}}/examples/slurm_mila/run.sh +++ /dev/null @@ -1,2 +0,0 @@ -mkdir -p logs -sbatch to_submit.sh diff --git a/{{cookiecutter.project_slug}}/examples/slurm_mila_orion/config.yaml b/{{cookiecutter.project_slug}}/examples/slurm_mila_orion/config.yaml deleted file mode 100644 index 5c0028c..0000000 --- a/{{cookiecutter.project_slug}}/examples/slurm_mila_orion/config.yaml +++ /dev/null @@ -1,14 +0,0 @@ -# general -batch_size: 32 -optimizer: adam -loss: L1 -patience: 5 -architecture: my_model -max_epoch: 99 -exp_name: my_exp_1 -# set to null to avoid setting a seed (can speed up GPU computation, but -# results will not be reproducible) -seed: 1234 - -# architecture -size: 'orion~uniform(1,100,discrete=True)' diff --git a/{{cookiecutter.project_slug}}/examples/slurm_mila_orion/orion_config.yaml b/{{cookiecutter.project_slug}}/examples/slurm_mila_orion/orion_config.yaml deleted file mode 100644 index f6bd2e1..0000000 --- a/{{cookiecutter.project_slug}}/examples/slurm_mila_orion/orion_config.yaml +++ /dev/null @@ -1,16 +0,0 @@ -experiment: - name: - my_exp - max_trials: 2 - working_dir: - orion_working_dir - algorithms: - random: - seed: 1234 -evc: - non_monitored_arguments: - - output - - data - - tmp-folder - ignore_code_changes: - true diff --git a/{{cookiecutter.project_slug}}/examples/slurm_mila_orion/run.sh b/{{cookiecutter.project_slug}}/examples/slurm_mila_orion/run.sh deleted file mode 100644 index 9370362..0000000 --- a/{{cookiecutter.project_slug}}/examples/slurm_mila_orion/run.sh +++ /dev/null @@ -1,2 +0,0 @@ -mkdir -p logs -sbatch to_submit.sh diff --git a/{{cookiecutter.project_slug}}/examples/slurm_cc_orion/config.yaml b/{{cookiecutter.project_slug}}/examples/slurm_orion/config.yaml similarity index 100% rename from {{cookiecutter.project_slug}}/examples/slurm_cc_orion/config.yaml rename to {{cookiecutter.project_slug}}/examples/slurm_orion/config.yaml diff --git a/{{cookiecutter.project_slug}}/examples/slurm_cc_orion/orion_config.yaml b/{{cookiecutter.project_slug}}/examples/slurm_orion/orion_config.yaml similarity index 100% rename from {{cookiecutter.project_slug}}/examples/slurm_cc_orion/orion_config.yaml rename to {{cookiecutter.project_slug}}/examples/slurm_orion/orion_config.yaml diff --git a/{{cookiecutter.project_slug}}/examples/slurm_cc_orion/run.sh b/{{cookiecutter.project_slug}}/examples/slurm_orion/run.sh similarity index 100% rename from {{cookiecutter.project_slug}}/examples/slurm_cc_orion/run.sh rename to {{cookiecutter.project_slug}}/examples/slurm_orion/run.sh diff --git a/{{cookiecutter.project_slug}}/examples/slurm_mila_orion/to_submit.sh b/{{cookiecutter.project_slug}}/examples/slurm_orion/to_submit.sh similarity index 70% rename from {{cookiecutter.project_slug}}/examples/slurm_mila_orion/to_submit.sh rename to {{cookiecutter.project_slug}}/examples/slurm_orion/to_submit.sh index a6e669e..1143f83 100644 --- a/{{cookiecutter.project_slug}}/examples/slurm_mila_orion/to_submit.sh +++ b/{{cookiecutter.project_slug}}/examples/slurm_orion/to_submit.sh @@ -1,16 +1,23 @@ #!/bin/bash -# __TODO__ fix options if needed #SBATCH --job-name={{ cookiecutter.project_slug }} -#SBATCH --partition=long +{%- if cookiecutter.environment == 'mila' %} +## this is for the mila cluster (uncomment it if you need it): +##SBATCH --account=rrg-bengioy-ad +## this instead for ComputCanada (uncomment it if you need it): +##SBATCH --partition=long +# to attach a tag to your run (e.g., used to track the GPU time) +# uncomment the following line and add replace `my_tag` with the proper tag: +##SBATCH --wckey=my_tag +{%- endif %} +{%- if cookiecutter.environment == 'generic' %} +## set --account=... or --partition=... as needed. +{%- endif %} #SBATCH --cpus-per-task=2 #SBATCH --gres=gpu:1 #SBATCH --mem=5G #SBATCH --time=0:05:00 #SBATCH --output=logs/%x__%j.out #SBATCH --error=logs/%x__%j.err -# to attach a tag to your run (e.g., used to track the GPU time) -# uncomment the following line and add replace `my_tag` with the proper tag: -##SBATCH --wckey=my_tag # remove one # if you prefer receiving emails ##SBATCH --mail-type=all ##SBATCH --mail-user={{ cookiecutter.email }}