Skip to content

Commit

Permalink
Simplify the algorithm tests, setup incremental testing (#35)
Browse files Browse the repository at this point in the history
* Add parametrize_when_used mark to simplify tests

Signed-off-by: Fabrice Normandin <[email protected]>

* Add a new, simpler test suite (that works!)

Signed-off-by: Fabrice Normandin <[email protected]>

* Rename the algorithm tests class (wip)

Signed-off-by: Fabrice Normandin <[email protected]>

* Further simplify the typing in the example

Signed-off-by: Fabrice Normandin <[email protected]>

* Remove the older (uglier) test suite for algos

Signed-off-by: Fabrice Normandin <[email protected]>

* Remove the unused classification test suite

Signed-off-by: Fabrice Normandin <[email protected]>

* Add missing config

Signed-off-by: Fabrice Normandin <[email protected]>

* Add the badges in the README

Signed-off-by: Fabrice Normandin <[email protected]>

* Remove outdated Protocol

Signed-off-by: Fabrice Normandin <[email protected]>

* Set JAX_PLATFORMS=cpu when no GPU is found

Signed-off-by: Fabrice Normandin <[email protected]>

* Debugging weird xpass/xfails

Signed-off-by: Fabrice Normandin <[email protected]>

* (ugly commit) remove unused code, add doctests

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix some issues in tests

Signed-off-by: Fabrice Normandin <[email protected]>

* Add missing __init__.py

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix more issues in tests

Signed-off-by: Fabrice Normandin <[email protected]>

* Add a docstring in TestJaxExample

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix weird XPASS in tests

Signed-off-by: Fabrice Normandin <[email protected]>

* Add batch size fix from Lightning-Hydra-Template

Signed-off-by: Fabrice Normandin <[email protected]>

* [ugly] Add regression files to check if CI works

Signed-off-by: Fabrice Normandin <[email protected]>

* Add nice docstrings for env_vars.py

Signed-off-by: Fabrice Normandin <[email protected]>

* Revert "[ugly] Add regression files to check if CI works"

This reverts commit ad1e630.

* Use --gen-missing flag in CI for now

Signed-off-by: Fabrice Normandin <[email protected]>

* Slightly simplify main.py objective calculation

Signed-off-by: Fabrice Normandin <[email protected]>

* Remove broken test for code blocks in docstrings

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix test for jax on CPU

Signed-off-by: Fabrice Normandin <[email protected]>

* Simplify main_test.py

Signed-off-by: Fabrice Normandin <[email protected]>

* Save regression files in subfolder based on device

Signed-off-by: Fabrice Normandin <[email protected]>

* Change README and fix link

Signed-off-by: Fabrice Normandin <[email protected]>

* Trim down docs generation script, minor doc fixes

Signed-off-by: Fabrice Normandin <[email protected]>

* Skip regression check when files are missing

Signed-off-by: Fabrice Normandin <[email protected]>

* Reduce amount of warnings generated in tests

Signed-off-by: Fabrice Normandin <[email protected]>

* Remove unused code in project.utils.utils.py

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix command-line flag used to skip checks

Signed-off-by: Fabrice Normandin <[email protected]>

* Tweak README.md

Signed-off-by: Fabrice Normandin <[email protected]>

* Simplify the actions-runner-job.sh

Signed-off-by: Fabrice Normandin <[email protected]>

* Add missing flag in build.yml

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix the `example.yaml` config in example group

Signed-off-by: Fabrice Normandin <[email protected]>

* Add todos for generating reference docs

Signed-off-by: Fabrice Normandin <[email protected]>

* Simplify example.py

Signed-off-by: Fabrice Normandin <[email protected]>

* Add a small docstring to project.configs

Signed-off-by: Fabrice Normandin <[email protected]>

* Remove overrides in top-level config, fix 'name'

Signed-off-by: Fabrice Normandin <[email protected]>

* Use hydra_zen.instantiate by default (no pydantic)

Signed-off-by: Fabrice Normandin <[email protected]>

* Add useful callbacks to defaults

Signed-off-by: Fabrice Normandin <[email protected]>

* Add/tweak config files

Signed-off-by: Fabrice Normandin <[email protected]>

* Simplify the network / layers

Signed-off-by: Fabrice Normandin <[email protected]>

* Don't dynamically create algo configs

Signed-off-by: Fabrice Normandin <[email protected]>

* Add tensorboard logger config from hydra-template

Signed-off-by: Fabrice Normandin <[email protected]>

* Rename `optimizer` arg to `optimizer_config`

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix test_defaults

Signed-off-by: Fabrice Normandin <[email protected]>

* Update tensor-regression dependency

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix missing python in actions-runner-job.sh

Signed-off-by: Fabrice Normandin <[email protected]>

---------

Signed-off-by: Fabrice Normandin <[email protected]>
  • Loading branch information
lebrice authored Aug 7, 2024
1 parent e777ca5 commit 264b5a1
Show file tree
Hide file tree
Showing 54 changed files with 1,280 additions and 2,225 deletions.
44 changes: 12 additions & 32 deletions .github/actions-runner-job.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,9 @@

set -euo pipefail

# todo: load modules here? or in the job steps?
# module --quiet purge
# module load cuda/12.2.2

# module load cuda/12.0


archive="actions-runner-linux-x64-2.317.0.tar.gz"
Expand All @@ -27,54 +27,34 @@ ln --symbolic --force $SCRATCH/$archive $SLURM_TMPDIR/$archive

cd $SLURM_TMPDIR

# Check the archive integrity.
echo "9e883d210df8c6028aff475475a457d380353f9d01877d51cc01a17b2a91161d $archive" | shasum -a 256 -c

# Extract the installer
tar xzf ./actions-runner-linux-x64-2.317.0.tar.gz

# NOTE: Could use this to get a token programmatically!
# https://docs.github.com/en/rest/actions/self-hosted-runners?apiVersion=2022-11-28#create-a-registration-token-for-an-organization

# cluster=${SLURM_CLUSTER_NAME:-local}
cluster=${SLURM_CLUSTER_NAME:-`hostname`}

# Use the GitHub API to get a registration token for a self-hosted runner.
# This requires you to be an admin of the repository and to have the $SH_TOKEN secret set to your
# github token.
# https://docs.github.com/en/rest/actions/self-hosted-runners?apiVersion=2022-11-28#create-a-registration-token-for-a-repository
# curl -L \
# -X POST \
# -H "Accept: application/vnd.github+json" \
# -H "Authorization: Bearer <YOUR-TOKEN>" \
# -H "X-GitHub-Api-Version: 2022-11-28" \
# https://api.github.com/repos/OWNER/REPO/actions/runners/registration-token

# Example output:
# {
# "token": "XXXXX",
# "expires_at": "2020-01-22T12:13:35.123-08:00"
# }


if ! command -v jq &> /dev/null; then
echo "the jq command doesn't seem to be installed."

if ! test -f ~/.local/bin/jq; then
echo "jq is not found at ~/.local/bin/jq, downloading it."
# TODO: this assumes that ~/.local/bin is in $PATH, I'm not 100% sure that this is standard.
mkdir -p ~/.local/bin
wget https://github.com/jqlang/jq/releases/download/jq-1.7.1/jq-linux-amd64 -O ~/.local/bin/jq
chmod +x ~/.local/bin/jq
fi
fi

source ~/.bash_aliases
module load python/3.10

TOKEN=`curl -L \
-X POST \
-H "Accept: application/vnd.github+json" \
-H "Authorization: Bearer ${SH_TOKEN:?The SH_TOKEN env variable is not set}" \
-H "Authorization: Bearer $SH_TOKEN" \
-H "X-GitHub-Api-Version: 2022-11-28" \
https://api.github.com/repos/mila-iqia/ResearchTemplate/actions/runners/registration-token | ~/.local/bin/jq -r .token`
https://api.github.com/repos/mila-iqia/ResearchTemplate/actions/runners/registration-token | \
python -c "import sys, json; print(json.load(sys.stdin)['token'])"`

# Create the runner and configure it programmatically
# Create the runner and configure it programmatically with the token we just got from the GitHub API.
cluster=$SLURM_CLUSTER_NAME
./config.sh --url https://github.com/mila-iqia/ResearchTemplate --token $TOKEN \
--unattended --replace --name $cluster --labels $cluster $SLURM_JOB_ID --ephemeral

Expand Down
9 changes: 4 additions & 5 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -54,11 +54,11 @@ jobs:
- name: Test with pytest (very fast)
env:
JAX_PLATFORMS: cpu
run: pdm run pytest -v --shorter-than=1.0 --cov=project --cov-report=xml --cov-append
run: pdm run pytest -v --shorter-than=1.0 --cov=project --cov-report=xml --cov-append --skip-if-files-missing
- name: Test with pytest (fast)
env:
JAX_PLATFORMS: cpu
run: pdm run pytest -v --cov=project --cov-report=xml --cov-append
run: pdm run pytest -v --cov=project --cov-report=xml --cov-append --skip-if-files-missing

- name: Store coverage report as an artifact
uses: actions/upload-artifact@v4
Expand All @@ -84,8 +84,7 @@ jobs:
run: pdm config install.cache true && pdm install

- name: Test with pytest
run: pdm run pytest -v --cov=project --cov-report=xml --cov-append

run: pdm run pytest -v --cov=project --cov-report=xml --cov-append --skip-if-files-missing
# TODO: this is taking too long to run, and is failing consistently. Need to debug this before making it part of the CI again.
# - name: Test with pytest (only slow tests)
# run: pdm run pytest -v -m slow --slow --cov=project --cov-report=xml --cov-append
Expand Down Expand Up @@ -142,7 +141,7 @@ jobs:
run: pdm install

- name: Test with pytest
run: pdm run pytest -v --cov=project --cov-report=xml --cov-append
run: pdm run pytest -v --cov=project --cov-report=xml --cov-append --gen-missing

# TODO: Re-enable this later
# - name: Test with pytest (only slow tests)
Expand Down
63 changes: 60 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,65 @@
# Research Project Template

![Build](https://github.com/mila-iqia/ResearchTemplate/workflows/build.yml/badge.svg)
[![Build](https://github.com/mila-iqia/ResearchTemplate/actions/workflows/build.yml/badge.svg?branch=master)](https://github.com/mila-iqia/ResearchTemplate/actions/workflows/build.yml)
[![codecov](https://codecov.io/gh/mila-iqia/ResearchTemplate/graph/badge.svg?token=I2DYLK8NTD)](https://codecov.io/gh/mila-iqia/ResearchTemplate)
[![hydra](https://img.shields.io/badge/Config-Hydra_1.3-89b8cd)](https://hydra.cc/)
[![license](https://img.shields.io/badge/License-MIT-green.svg?labelColor=gray)](https://github.com/mila-iqia/ResearchTemplate#license)

Please note: This is a **Work-in-Progress**. The goal is to make a first release by the end of summer 2024.
Please note: This is a Work-in-Progress. The goal is to make a first release by the end of summer 2024.

For now, feel free to take a look at the [documentation page](https://mila-iqia.github.io/ResearchTemplate/) if you want more information about this project.
This is a template repository for a research project in machine learning. It is meant to be a starting point for new ML researchers that run jobs on SLURM clusters.
The main target audience is [Mila](https://mila.quebec/en) researchers and students, but this should still be useful to anyone that uses PyTorch-Lightning with Hydra.

For more context, see [this introduction to the project.](https://mila-iqia.github.io/ResearchTemplate/overview/intro).

## Overview

This project makes use of the following libraries:

- [Hydra](https://hydra.cc/) is used to configure the project. It allows you to define configuration files and override them from the command line.
- [PyTorch Lightning](https://lightning.ai/docs/pytorch/stable/) is used to as the training framework. It provides a high-level interface to organize ML research code.
- 🔥 Please note: You can also use [Jax](https://jax.readthedocs.io/en/latest/) with this repo, as is shown in the [Jax example](https://mila-iqia.github.io/ResearchTemplate/examples/jax) 🔥
- [Weights & Biases](https://wandb.ai) is used to log metrics and visualize results.
- [pytest](https://docs.pytest.org/en/stable/) is used for testing.

## Why use this template?

Why should you use this template (instead of another)?

Here are some of the advantages to using this template compared to [some of the other templates out there](https://mila-iqia.github.io/ResearchTemplate/related):

- ❗Support for both Jax and Torch with PyTorch-Lightning ❗
- Easy development inside a [Development Container](https://code.visualstudio.com/docs/remote/containers) with [VsCode](https://code.visualstudio.com/)
- Tailor-made for ML researchers that run their jobs on SLURM clusters (with default configurations for the [Mila](https://docs.mila.quebec) and [DRAC](https://docs.alliancecan.ca) clusters.)
- Rich typing and documentation of all parts of the source code using Python 3.12's new type annotation syntax
- A comprehensive suite of automated tests for all algorithms, datasets and networks that are easy to reuse and extend
- Automatically creates Yaml Schemas for your Hydra config files (as soon as #7 is merged)

## Usage

To see all available options:

```bash
python project/main.py --help
```

For a detailed list of examples, see the [examples page](https://mila-iqia.github.io/ResearchTemplate/examples/examples).

<!-- * `mkdocs new [dir-name]` - Create a new project.
* `mkdocs serve` - Start the live-reloading docs server.
* `mkdocs build` - Build the documentation site.
* `mkdocs -h` - Print help message and exit. -->

## Project layout

```
pyproject.toml # Project metadata and dependencies
project/
main.py # main entry-point
algorithms/ # learning algorithms
datamodules/ # datasets, processing and loading
networks/ # Neural networks used by algorithms
configs/ # configuration files
docs/ # documentation
conftest.py # Test fixtures and utilities
```
5 changes: 5 additions & 0 deletions conftest.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,11 @@
import os
from pathlib import Path

import pytest
import torch

if not torch.cuda.is_available():
os.environ["JAX_PLATFORMS"] = "cpu"


def pytest_addoption(parser: pytest.Parser):
Expand Down
35 changes: 35 additions & 0 deletions docs/docs_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
import pathlib

import pytest
from mktestdocs import check_md_file

# This retrieves all methods/properties that have a docstring.
# todo: Brittle. We'd like something like griffe, that gets all functions / classes / etc in our module.
# members = get_codeblock_members(*[v for k, v in vars(project).items() if k != "__all__"])


def get_pretty_id(obj):
if hasattr(obj, "__qualname__"):
return obj.__qualname__
if hasattr(obj, "__name__"):
return obj.__name__
return str(obj)


# todo: do we want to run the tests here? or do we just test the doc pages?
# @pytest.mark.parametrize(
# "obj",
# list(itertools.chain(map(getmembers, [project, project.configs, project.algorithms]))),
# ids=get_pretty_id,
# )
# def test_member(obj):
# check_docstring(obj)


docs_folder = pathlib.Path(__file__).parent


# Note the use of `str`, makes for pretty output
@pytest.mark.parametrize("fpath", docs_folder.rglob("*.md"), ids=str)
def test_documentation_file(fpath):
check_md_file(fpath=fpath)
2 changes: 1 addition & 1 deletion docs/examples/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ TODOs:
## Simple run

```bash
python project/main.py algorithm=example_algo datamodule=mnist network=fcnet
python project/main.py algorithm=example datamodule=mnist network=fcnet
```

## Running a Hyper-Parameter sweep on a SLURM cluster
Expand Down
76 changes: 37 additions & 39 deletions docs/generate_reference_docs.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,80 +3,78 @@


import textwrap
from logging import getLogger as get_logger
from pathlib import Path

import mkdocs_gen_files
import mkdocs_gen_files.nav

from project.utils.env_vars import REPO_ROOTDIR

module = "project"
modules = [
"project/main.py",
"project/experiment.py",
]
submodules = [
"project.algorithms",
"project.configs",
"project.datamodules",
"project.networks",
"project.utils",
]
logger = get_logger(__name__)


def _get_import_path(module_path: Path) -> str:
"""Returns the path to use to import a given (internal) module."""
return ".".join(module_path.relative_to(REPO_ROOTDIR).with_suffix("").parts)
def main():
add_doc_for_module(REPO_ROOTDIR / "project")


def main():
nav = mkdocs_gen_files.nav.Nav()
def add_doc_for_module(module_path: Path) -> None:
"""Creates a markdown file in the "reference" section for this module and its submodules
recursively.
add_doc_for_module(REPO_ROOTDIR / "project", nav)
## TODOs:
- [ ] We don't currently see the docs from the docstrings of __init__.py files.
- [ ] Might be nice to show the config files also?
"""

# with mkdocs_gen_files.open("reference/SUMMARY.md", "w") as nav_file:
# # assert False, "\n".join(nav.build_literate_nav())
# nav_file.writelines(nav.build_literate_nav())
assert module_path.is_dir() # and (module_path / "__init__.py").exists(), module_path

# module_import_path = _get_import_path(module_path)
# doc_file = module_path.relative_to(REPO_ROOTDIR).with_suffix(".md")
# write_doc_file = "reference" / doc_file
# with mkdocs_gen_files.editor.FilesEditor.current().open(str(write_doc_file), "w") as f:
# print(
# textwrap.dedent(f"""\
# ::: {module_import_path}

def add_doc_for_module(module_path: Path, nav: mkdocs_gen_files.nav.Nav) -> None:
"""TODO."""
# """),
# file=f,
# )

assert module_path.is_dir() and (module_path / "__init__.py").exists(), module_path
def is_module(p: Path) -> bool:
return (
p.suffix == ".py" and not p.name.startswith("__") and not p.name.endswith("_test.py")
)

children = list(
p
for p in module_path.glob("*.py")
if not p.name.startswith("__") and not p.name.endswith("_test.py")
)
children = list(p for p in module_path.glob("*.py") if is_module(p))
for child_module_path in children:
child_module_import_path = _get_import_path(child_module_path)
doc_file = child_module_path.relative_to(REPO_ROOTDIR).with_suffix(".md")
write_doc_file = f"reference/{doc_file}"
write_doc_file = "reference" / doc_file

nav[tuple(child_module_import_path.split("."))] = f"{doc_file}"

with mkdocs_gen_files.open(write_doc_file, "w") as f:
with mkdocs_gen_files.editor.FilesEditor.current().open(str(write_doc_file), "w") as f:
print(
textwrap.dedent(f"""\
::: {child_module_import_path}
"""),
file=f,
)
docs_dir = REPO_ROOTDIR / "docs"
module_path_relative_to_docs_dir = child_module_path.relative_to(docs_dir, walk_up=True)
mkdocs_gen_files.set_edit_path(write_doc_file, str(module_path_relative_to_docs_dir))

submodules = list(
p
for p in module_path.iterdir()
if p.is_dir()
and (p / "__init__.py").exists()
and ((p / "__init__.py").exists() or len(list(p.glob("*.py"))) > 0)
and not p.name.endswith("_test")
and not p.name.startswith((".", "__"))
)
for submodule in submodules:
add_doc_for_module(submodule, nav)
logger.info(f"Creating doc for {submodule}")
add_doc_for_module(submodule)


def _get_import_path(module_path: Path) -> str:
"""Returns the path to use to import a given (internal) module."""
return ".".join(module_path.relative_to(REPO_ROOTDIR).with_suffix("").parts)


if __name__ in ["__main__", "<run_path>"]:
Expand Down
Loading

0 comments on commit 264b5a1

Please sign in to comment.