-
Notifications
You must be signed in to change notification settings - Fork 7
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* feat: add option to specify a temporary folder for the experiment. (#5) * added option to rsync input and output data * added docstring * logging to stdout now * fixed script for clusters - now using slurm tmpdir to write temp results * fixing travis * added missing docstring * fixed tensorflow part (method signature change) * renamed variables * Seed for reproducibility (#6) * added option to rsync input and output data * added docstring * logging to stdout now * fixed script for clusters - now using slurm tmpdir to write temp results * fixing travis * added missing docstring * fixed tensorflow part (method signature change) * added seed for pytorch * fixed typo * added comment on how to use seed * fixed flake8 * added test on reproducibility * removed pytorch part from tensorflow * fixed cookiecutter syntax * added check for tensorflow * fixed typo in test file * added command to set the seed in tensorflow * fixed flake8 error * fixed typos * removed duplicate log * typo in docstring * better error message in test * added test to check repro using Orion (#8) * added test to check repro using Orion * more log into travis * more info to debug travis * running two trials for orion * added seed to orion * added orion test to tensorflow part * better log messages in travis * Add support for keras and Pytorch Lighning (#12) * added code for keras - still need to complete all tests * fixed flake8 * started adding PyTorch Lightning support - note that mlflow and loading/saving model still does not work * fixed api change * fixed pytorch early stopping * fixed flake8 * fixed flake8 for pytorch version * fixed keras part for flake8 * added code to resume a model - for pytorch lightning * removed forgotten diff * fixed start_from_scratch (not loading a model even if present) / now printing the val loss in the logs * pytorch lightning now correctly logging under the same run * now pytorch is correctly resuming training and continues to plot in the same mlflow run * added github actions * using a different ubuntu image * printing folder - trying to fix github actions * telling git who I am.. * removed not useful test * fixed typo in test folder * removed travis configuration - using github actions from now on * correctly handling the saved models in pytorch * now passing the full hyper-parameter object to train_impl method (for more flexibility). * added option to ask for gpus in pytorch * improved error message * Fixups for the lightning_and_keras PR (#12) (#22) * Update torch model to pl-lightning model * Refactor train+model impl w/ optim module * Refactor data loader w/ data module for plightning * removing codecov from cookeicutter. (#24) * moving to github actions (#25) * removing coverage computation * moving from travis to gitbug actions. * setting fake name/email for git. * removed (not-correct) duplicate for github actions config file. * fixing tests. * refactored pytorch models. (#26) Co-authored-by: Mirko Bronzi <[email protected]> * running CI also on develop. Co-authored-by: Pierre-Luc St-Charles <[email protected]> * Adding more CI backends. (#27) * added github actions. * moved python version to 3.9 - by default. * added support for azure continuous integration. * updated mlflox/orion dependencies. * now correctly restoring models for pytorch. (#28) * Now running test-coverage locally. (#30) * running test coverage locally. * fixed project name. * correctly allowing mlflow to work in any folder. (#29) * removed duplicate CI. * Update cookiecutter doc url (#37) * made the template generic by default - will add mila-specific aspects only if enabled at template instantiation time (#38) * default branch is now main (#39) * made the template generic by default - will add mila-specific aspects only if enabled at template instantiation time * now using main as the default branch for github * Fixed typo Co-authored-by: Pierre-Luc St-Charles <[email protected]> Co-authored-by: Mathieu Germain <[email protected]>
- Loading branch information
1 parent
f139850
commit f5c9701
Showing
18 changed files
with
43 additions
and
110 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,3 @@ | ||
[![Build Status](https://travis-ci.com/{{ cookiecutter.github_username }}/{{ cookiecutter.project_slug }}.png?branch=master)](https://travis-ci.com/{{ cookiecutter.github_username }}/{{ cookiecutter.project_slug }}) | ||
|
||
{% set is_open_source = cookiecutter.open_source_license != 'Not open source' -%} | ||
|
||
# {{ cookiecutter.project_name }} | ||
|
@@ -46,9 +44,12 @@ These hooks will: | |
Go on github and follow the instructions to create a new project. | ||
When done, do not add any file, and follow the instructions to | ||
link your local git to the remote project, which should look like this: | ||
(PS: these instructions are reported here for your convenience. | ||
We suggest to also look at the GitHub project page for more up-to-date info) | ||
|
||
git remote add origin [email protected]:{{ cookiecutter.github_username }}/{{ cookiecutter.project_slug }}.git | ||
git push -u origin master | ||
git branch -M main | ||
git push -u origin main | ||
|
||
### Setup Continuous Integration | ||
|
||
|
@@ -66,7 +67,7 @@ Check the following instructions for more details. | |
Github actions are already configured in `.github/workflows/tests.yml`. | ||
Github actions are already enabled by default when using Github, so, when | ||
pushing to github, they will be executed automatically for pull requests to | ||
`master` and to `develop`. | ||
`main` and to `develop`. | ||
|
||
#### Travis | ||
|
||
|
@@ -120,12 +121,10 @@ Note you have two new folders now: | |
You can run mlflow from this folder (`examples/local`) by running | ||
`mlflow ui`. | ||
|
||
#### Run on the Mila cluster | ||
(NOTE: this example also apply to Compute Canada - use the folders | ||
`slurm_cc` and `slurm_cc_orion` instead of `slurm_mila` and `slurm_mila_orion`.) | ||
#### Run on a remote cluster (with Slurm) | ||
|
||
First, bring you project on the Mila cluster (assuming you didn't create your | ||
project directly there). To do so, simply login on the Mila cluster and git | ||
First, bring you project on the cluster (assuming you didn't create your | ||
project directly there). To do so, simply login on the cluster and git | ||
clone your project: | ||
|
||
git clone [email protected]:{{ cookiecutter.github_username }}/{{ cookiecutter.project_slug }}.git | ||
|
@@ -135,12 +134,13 @@ Then activate your virtual env, and install the dependencies: | |
cd {{ cookiecutter.project_slug }} | ||
pip install -e . | ||
|
||
To run with SLURM, just: | ||
To run with Slurm, just: | ||
|
||
cd examples/slurm_mila | ||
cd examples/slurm | ||
sh run.sh | ||
|
||
Check the log to see that you got an almost perfect loss (i.e., 0). | ||
{%- if cookiecutter.environment == 'mila' %} | ||
|
||
#### Measure GPU time (and others) on the Mila cluster | ||
|
||
|
@@ -184,11 +184,12 @@ In a separate shell on your local computer, run the following command: | |
where `<username>` is your user name on the Mila cluster and `<hostname>` is the name of the machine your job is currenty running on (`leto35` in our example). You can then navigate your local browser to `http://localhost:19999/` to view the ressources being used on the cluster and monitor your job. You should see something like this: | ||
|
||
![image](https://user-images.githubusercontent.com/18450628/88088807-fe2acd80-cb58-11ea-8ab2-bd090e8a826c.png) | ||
{%- endif %} | ||
|
||
#### Run with Orion on the Mila cluster | ||
#### Run with Orion on the Slurm cluster | ||
|
||
This example will run orion for 2 trials (see the orion config file). | ||
To do so, go into `examples/slurm_mila_orion`. | ||
To do so, go into `examples/slurm_orion`. | ||
Here you can find the orion config file (`orion_config.yaml`), as well as the config | ||
file (`config.yaml`) for your project (that contains the hyper-parameters). | ||
|
||
|
@@ -204,7 +205,7 @@ Inside these folders, you can find the models (the best one and the last one), t | |
the hyper-parameters for this trial, and the log file. | ||
|
||
You can check orion status with the following commands: | ||
(to be run from `examples/slurm_mila_orion`) | ||
(to be run from `examples/slurm_orion`) | ||
|
||
export ORION_DB_ADDRESS='orion_db.pkl' | ||
export ORION_DB_TYPE='pickleddb' | ||
|
File renamed without changes.
File renamed without changes.
13 changes: 12 additions & 1 deletion
13
...t_slug}}/examples/slurm_mila/to_submit.sh → ...roject_slug}}/examples/slurm/to_submit.sh
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
16 changes: 0 additions & 16 deletions
16
{{cookiecutter.project_slug}}/examples/slurm_cc/to_submit.sh
This file was deleted.
Oops, something went wrong.
23 changes: 0 additions & 23 deletions
23
{{cookiecutter.project_slug}}/examples/slurm_cc_orion/to_submit.sh
This file was deleted.
Oops, something went wrong.
14 changes: 0 additions & 14 deletions
14
{{cookiecutter.project_slug}}/examples/slurm_mila/config.yaml
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
14 changes: 0 additions & 14 deletions
14
{{cookiecutter.project_slug}}/examples/slurm_mila_orion/config.yaml
This file was deleted.
Oops, something went wrong.
16 changes: 0 additions & 16 deletions
16
{{cookiecutter.project_slug}}/examples/slurm_mila_orion/orion_config.yaml
This file was deleted.
Oops, something went wrong.
2 changes: 0 additions & 2 deletions
2
{{cookiecutter.project_slug}}/examples/slurm_mila_orion/run.sh
This file was deleted.
Oops, something went wrong.
File renamed without changes.
File renamed without changes.
File renamed without changes.
17 changes: 12 additions & 5 deletions
17
...}}/examples/slurm_mila_orion/to_submit.sh → ..._slug}}/examples/slurm_orion/to_submit.sh
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters