Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notebooks cleanup #82

Merged
merged 33 commits into from
Oct 3, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
642475d
move Jupyter notebooks
raehik Mar 20, 2023
4fecaa5
delete empty Jupyter notebook
raehik Mar 20, 2023
24783ba
gitignore: global ignore Jupyter cache dirs
raehik Mar 20, 2023
ebffeb1
delete temporary Jupyter notebook
raehik Mar 20, 2023
c673987
rename Jupyter notebooks
raehik Mar 20, 2023
3757a0d
add short note for Jupyter notebooks
raehik Mar 20, 2023
52111b9
fix merge conflict
MarionBWeinzierl Jul 27, 2023
f83a5c8
set path to src in jupyter notebooks
MarionBWeinzierl Jul 27, 2023
9e3dd7e
various fixes to work towards getting the notebooks running #59
MarionBWeinzierl Aug 15, 2023
b9953d8
clean up outputs of notebook
MarionBWeinzierl Aug 15, 2023
c4ae572
Merge remote-tracking branch 'origin/main' into notebooks-cleanup
MarionBWeinzierl Aug 17, 2023
04ad24b
#59 fixed notebook (and related files) to create figure 1b
MarionBWeinzierl Aug 18, 2023
948a071
add some short comments to the notebook
MarionBWeinzierl Aug 21, 2023
8f2e919
Add mlrun within jupyter notebook directory to gitignore
MarionBWeinzierl Aug 23, 2023
dd87510
removed absolute path and make sure you don't need to be in notebook …
MarionBWeinzierl Aug 23, 2023
d9d9821
removed unnecessary import
MarionBWeinzierl Aug 23, 2023
c7321aa
fixed notebook for figure 6
MarionBWeinzierl Aug 24, 2023
d461375
cleanup
MarionBWeinzierl Aug 24, 2023
a0d192c
Update README.md - add comments about data for notebooks etc
MarionBWeinzierl Sep 1, 2023
69cf61d
Update README.md - add paper links
MarionBWeinzierl Sep 1, 2023
ae72c01
delete no longer needed notebooks as identified in #82
MarionBWeinzierl Sep 14, 2023
d693fa1
Merge branch 'notebooks-cleanup' of github.com:m2lines/gz21_ocean_mom…
MarionBWeinzierl Sep 14, 2023
27f55c3
resolved conflicts (by taking new version)
MarionBWeinzierl Sep 21, 2023
3beae32
fixes in README and get test_global_control script to run
MarionBWeinzierl Sep 22, 2023
eacd355
Merge remote-tracking branch 'origin/main' into notebooks-cleanup
MarionBWeinzierl Sep 24, 2023
4736d51
working on test_global_control
MarionBWeinzierl Sep 25, 2023
3fd2a2c
working on test_global_control
MarionBWeinzierl Sep 25, 2023
47b98a5
some clean-up
MarionBWeinzierl Sep 26, 2023
b22cfaf
fixed some things in the docs and notebook
MarionBWeinzierl Sep 27, 2023
9045a07
clean up
MarionBWeinzierl Sep 27, 2023
9b43026
clarified how to run the test_global_control notebook
MarionBWeinzierl Sep 29, 2023
a8d2948
rm output jpgs
MarionBWeinzierl Sep 29, 2023
edbc5f8
adjustments after PR review
MarionBWeinzierl Oct 3, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ __pycache__/*

# MLflow output
/mlruns/*
/examples/jupyter-notebooks/mlruns/*

# Jupyter notebook cache files
.ipynb_checkpoints/
/.pytest_cache/
25 changes: 13 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,15 +55,15 @@ With `pip` installed, run the following in the root directory:
[Poetry](https://python-poetry.org/). To use, rename `pyproject-poetry.toml` to
`pyproject.toml` (overwriting the existing file) and use Poetry as normal. Note
that the Poetry build is not actively supported-- if it fails, check that the
dependencies are up to date with the setuptools `pyproject.toml`.)*
dependencies are up-to-date with the setuptools `pyproject.toml`.)*

#### System
Some graphing code uses cartopy, which requires [GEOS](https://libgeos.org/). To
install on Ubuntu:

sudo apt install libgeos-dev

On MacOS, via Homebrew:
On macOS, via Homebrew:

brew install geos

Expand Down Expand Up @@ -100,16 +100,18 @@ with `--no-conda`
In order to make sure that data in- and output locations are well-defined, the
environment variable `MLFLOW_TRACKING_URI` must be set to the intended data location:

> export MLFLOW_TRACKING_URI="/path/to/data/dir"
export MLFLOW_TRACKING_URI="/path/to/data/dir"

in Linux, or
> %env MLFLOW_TRACKING_URI /path/to/data/dir
```
%env MLFLOW_TRACKING_URI /path/to/data/dir
```

in a Jupyter Notebook, or

```
import os
os.environ['MLFLOW_TRACKING_URI] = '/path/to/data/dir'
os.environ['MLFLOW_TRACKING_URI'] = '/path/to/data/dir'
```
in Python.

Expand Down Expand Up @@ -161,7 +163,7 @@ MLflow call example:

```
mlflow run . --experiment-name <name> -e train --env-manager=local \
-P exp_id=692154129919725696 -P run_id=c57b36da385e4fc4a967e7790192ecb2 \
-P run_id=<run id> \
-P learning_rate=0/5e-4/15/5e-5/30/5e-6 -P n_epochs=200 -P weight_decay=0.00 -P train_split=0.8 \
-P test_split=0.85 -P model_module_name=models.models1 -P model_cls_name=FullyCNN -P batchsize=4 \
-P transformation_cls_name=SoftPlusTransform -P submodel=transform3 \
Expand All @@ -175,7 +177,7 @@ Relevant parameters:
* `run_id`: id of the run that generated the forcing data that will be used for
training.
* `loss_cls_name`: name of the class that defines the loss. This class should be
defined in train/losses.py in order for the script to find it. Currently the
defined in train/losses.py in order for the script to find it. Currently, the
main available options are:
* `HeteroskedasticGaussianLossV2`: this corresponds to the loss used in the
2021 paper
Expand Down Expand Up @@ -212,17 +214,16 @@ In this step it is particularly important to set the environment variable `MLFLO
in order for the data to be found and stored in a sensible place.

One can run the inference step by interactively
running the following project root directory:
running the following in the project root directory:

>python3 -m gz21_ocean_momentum.inference.main --n_splits=40
python3 -m gz21_ocean_momentum.inference.main --n_splits=40

with `n_splits` being the number of subsets which the dataset is split
into for the processing, before being put back together for the final output.
This is done in order to avoid memory issues for large datasets.
Other useful arguments for this call would be
- `to_experiment`: the name of the mlflow experiment used for this run
n_splits: the number of splits applied to the data
- `batch_size`: the batch size used in running the neural network on the data
- `to_experiment`: the name of the mlflow experiment used for this run (default is "test").
- `batch_size`: the batch size used in running the neural network on the data.


After the script has started running, it will first require
Expand Down
Loading
Loading