Add documentation about how to reproduce the figures from the paper. #59

dorchard · 2023-06-28T07:35:36Z

Various scripts generate the figures. We should provide an explanation of how to regenerate the key figures from the paper, referring to them by figure name. This could be separated out into a separate script or notebook, as long as its clear from the README where to go to find this.

MarionBWeinzierl · 2023-08-14T14:47:48Z

@arthurBarthe , can you please identify which of the notebooks can be deleted and which are needed to reproduce the paper, and also how to get the data/setup to use them?

For example, there is mention of a load_paper_net() function, but to load data, it must be present somewhere from a run or so. Also, if I try running, for example, https://github.com/m2lines/gz21_ocean_momentum/blob/main/examples/jupyter-notebooks/offline_test_SWM.ipynb with the data from a previous mlflow run (does not work with a run without mlflow), the data is not found - it seems to look for netcdf files, I think, but in mlflow run as described in the README seems to not to create them (although it is marked as FINISHED, not FAILED, in the dataset)?

arthurBarthe · 2023-08-15T09:08:39Z

Hi Marion,

I think one good notebook to start with would be generate-figure-1.ipynb or _generate-figure-2.ipynb. Those should only require data from mlflow runs for the data processing step. Later on we will need to look at test_global_control.ipynb but this requires to run a test step (the notebook does not run the test, it is only to produce figures based on the output from the test step).

I would start with generate-figure-2.ipynb, this is probably the simplest one: it requires in the notebook to select an experiment (should be the one used in the data processing step) and a run_id from which to load the processed data, and the rest is just about making the plot using this data.

MarionBWeinzierl · 2023-08-15T15:45:05Z

Thank you!

That helps me a bit further.

I just noticed that in generate_figure-2, at the bottom it calls plot_training_subdomains, but we do not do any training with the data here, as you said. Function fails this line:

gz21_ocean_momentum/src/gz21_ocean_momentum/analysis/utils.py

Line 624 in f83a5c8

data_ids = run_params["source.run_id"].split("/")

, potentially because there is only one data set, or because there is no training. Do you know which of the two it is?

arthurBarthe · 2023-08-16T14:22:05Z

Ah, I think I know where the issue is coming from. Basically we want to plot the "rectangular" subdomains used for training. The way this information is stored has changed, and can instead be accessed by reading the file training_subdomains.yaml.

As for generate-figure1.ipynb, if you want to run it, you will need two data runs: one run with the CO2 parameter equal to 0, and another one with the CO2 parameter equal to 1.

MarionBWeinzierl · 2023-08-18T07:24:31Z

I managed to reproduce this figure:

There are still a couple of crutches in the code which need cleaning up, some special cases which need catching, and some docs/instructions I will need to write in/for the notebook.

Also, I only checked whether this is the correct figure by comparing it visually with the (contorted) figure 1b in the paper.

arthurBarthe · 2023-08-18T07:40:46Z

This looks good, thanks @MarionBWeinzierl ! For now "eye checking" should be fine, we will compare numerical metrics in the inference / test step. As for the table, I don't think we need to worry about it, we can probably remove that from the code.

MarionBWeinzierl · 2023-08-18T09:40:11Z

Is it correct that only figure 1b is plotted, or should 1a also be plotted, @arthurBarthe ?

MarionBWeinzierl · 2023-08-21T12:11:40Z

I created #82 for this issue as a draft PR which tracks the branch.

MarionBWeinzierl · 2023-08-24T15:25:45Z

@arthurBarthe , in notebook 'generate-paper-figure-6', (which was,in the above comment before the renaming, 'generate-figure-1'), I can either get

by handing the script the version is CO2=1 first and CO2=0 second, or

by doing it the other way around.

Comparing with figure 6a from the paper, I think that the second version is right, but I am not sure.

MarionBWeinzierl · 2023-08-24T15:33:58Z

I also get a runtime warning: gz21_ocean_momentum/venv/lib/python3.11/site-packages/dask/array/numpy_compat.py:43: RuntimeWarning: invalid value encountered in divide x = np.divide(x1, x2, out) (no info on where in the code), but it seems not to affect the run

MarionBWeinzierl · 2023-08-25T08:49:04Z

@arthurBarthe , I am now starting to look at 'test-global-control'. You said that a test step has to be run before that. What do you mean by that - is that an inference run?

arthurBarthe · 2023-08-31T14:07:11Z

Yes it should be CO2=0 first and CO2=1 second. As for the inference step, I am working on it: I run the data step and the training step, so now I should be able to try and run the inference step.

arthurBarthe · 2023-08-31T14:08:24Z

Quick question, in order to produce those figures, what is the size of the data you are using along the time dimension?

MarionBWeinzierl · 2023-08-31T14:14:19Z

Quick question, in order to produce those figures, what is the size of the data you are using along the time dimension?

I am using the call from the README, which has ntimes=100

arthurBarthe · 2023-08-31T14:19:37Z

Ah, we should be using the whole dataset, which I believe has more or less 3000 time points. I ran the data step on the HPC here with that parameter, with 4 nodes (each 8GB memory) and it ran in 30 minutes. This should explain the remaining difference with the figure from the paper. I should also update the readme accordingly.

MarionBWeinzierl · 2023-08-31T14:27:46Z

We might not need to update the repository readme, as this is a sufficiently small problem to run on a laptop and a nice starter problem.

Rather, we should add a comment in the notebook, or maybe better the readme in the notebook folder, and define there which calls to run to create the data for the paper figures.

MarionBWeinzierl · 2023-08-31T14:28:58Z

This would be the best place: https://github.com/m2lines/gz21_ocean_momentum/blob/notebooks-cleanup/examples/jupyter-notebooks/README.md

MarionBWeinzierl · 2023-08-31T14:37:36Z

As for the inference step, I am working on it: I run the data step and the training step, so now I should be able to try and run the inference step.

Is that then all the data which is need for test-global-control?

Which figure from the paper does that notebook generate?

arthurBarthe · 2023-08-31T14:42:23Z

We need data from the inference step, where we run the trained neural network (from the train step) on both data from CO2=0 and CO2=1 datasets. Then that should be it yes, and it should produce figures 4, 5, 7 I believe, depending on which run ids we request in the notebook.

MarionBWeinzierl · 2023-09-01T10:09:12Z

I added the information about which data to use etc to the jupyter notebook readme on the notebooks-cleanup branch.

MarionBWeinzierl · 2023-09-22T12:45:02Z

I fixed the test_global_control script (will need a bit of clean-up now).

These are the plots run with a reduced set with ntimes=100: plots.zip

Due to the reduced ntimes, I needed to also adapt some parameters in the plots for this.

@arthurBarthe , could you have a look and see whether they make roughly sense?

Also @arthurBarthe , can I delete the parts that say "IGNORE THIS" in test_global_control?

MarionBWeinzierl · 2023-09-22T12:49:21Z

Another question @arthurBarthe : You said I need the data for CO2=1 and CO2=0 for the test_global_control script. However, it is just asking for data once. Does that mean that, for reproducing the paper figures, I have to run this notebook twice, once with each setting?

Also, in the plot I have currently set the "merge" parameter to "none", and it did not ask for the separate data and inference steps. Do I need to have separately named experiments, so that I hand in those as tuples in the "merge" parameter, in order for the rest to run through as expected?

MarionBWeinzierl · 2023-09-22T14:29:04Z

@arthurBarthe , how does the data need to be organised for test_global_control?

MarionBWeinzierl · 2023-09-28T08:51:43Z

I noticed that in the pushed notebook on main, the experiment selected is called "data-global". Does that mean that for test_global_control the parameter global in the running of the data script should be set to 1?

MarionBWeinzierl · 2023-09-29T11:42:22Z

OK, we have established that the notebook has to be run twice, once with CO2=1 and once with CO2=1. Global control can stay zero. I am updating the readme accordingly.

dorchard changed the title ~~Add documentation to the README about how to reproduce the figures from the paper.~~ Add documentation about how to reproduce the figures from the paper. Jun 28, 2023

raehik mentioned this issue Jul 5, 2023

Add code documentation #7

Closed

dorchard assigned MarionBWeinzierl and arthurBarthe Jul 21, 2023

MarionBWeinzierl added a commit that referenced this issue Aug 15, 2023

various fixes to work towards getting the notebooks running #59

9e3dd7e

This comment was marked as resolved.

Sign in to view

MarionBWeinzierl added a commit that referenced this issue Aug 18, 2023

#59 fixed notebook (and related files) to create figure 1b

04ad24b

MarionBWeinzierl mentioned this issue Aug 18, 2023

Notebooks cleanup #82

Merged

5 tasks

MarionBWeinzierl closed this as completed in #82 Oct 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add documentation about how to reproduce the figures from the paper. #59

Add documentation about how to reproduce the figures from the paper. #59

dorchard commented Jun 28, 2023

MarionBWeinzierl commented Aug 14, 2023 •

edited

Loading

arthurBarthe commented Aug 15, 2023

MarionBWeinzierl commented Aug 15, 2023

arthurBarthe commented Aug 16, 2023

This comment was marked as resolved.

MarionBWeinzierl commented Aug 18, 2023

arthurBarthe commented Aug 18, 2023

MarionBWeinzierl commented Aug 18, 2023 •

edited

Loading

MarionBWeinzierl commented Aug 21, 2023

MarionBWeinzierl commented Aug 24, 2023

MarionBWeinzierl commented Aug 24, 2023

MarionBWeinzierl commented Aug 25, 2023

arthurBarthe commented Aug 31, 2023

arthurBarthe commented Aug 31, 2023

MarionBWeinzierl commented Aug 31, 2023

arthurBarthe commented Aug 31, 2023

MarionBWeinzierl commented Aug 31, 2023

MarionBWeinzierl commented Aug 31, 2023

MarionBWeinzierl commented Aug 31, 2023

arthurBarthe commented Aug 31, 2023

MarionBWeinzierl commented Sep 1, 2023

MarionBWeinzierl commented Sep 22, 2023

MarionBWeinzierl commented Sep 22, 2023

MarionBWeinzierl commented Sep 22, 2023

MarionBWeinzierl commented Sep 28, 2023

MarionBWeinzierl commented Sep 29, 2023 •

edited

Loading

Add documentation about how to reproduce the figures from the paper. #59

Add documentation about how to reproduce the figures from the paper. #59

Comments

dorchard commented Jun 28, 2023

MarionBWeinzierl commented Aug 14, 2023 • edited Loading

arthurBarthe commented Aug 15, 2023

MarionBWeinzierl commented Aug 15, 2023

arthurBarthe commented Aug 16, 2023

This comment was marked as resolved.

MarionBWeinzierl commented Aug 18, 2023

arthurBarthe commented Aug 18, 2023

MarionBWeinzierl commented Aug 18, 2023 • edited Loading

MarionBWeinzierl commented Aug 21, 2023

MarionBWeinzierl commented Aug 24, 2023

MarionBWeinzierl commented Aug 24, 2023

MarionBWeinzierl commented Aug 25, 2023

arthurBarthe commented Aug 31, 2023

arthurBarthe commented Aug 31, 2023

MarionBWeinzierl commented Aug 31, 2023

arthurBarthe commented Aug 31, 2023

MarionBWeinzierl commented Aug 31, 2023

MarionBWeinzierl commented Aug 31, 2023

MarionBWeinzierl commented Aug 31, 2023

arthurBarthe commented Aug 31, 2023

MarionBWeinzierl commented Sep 1, 2023

MarionBWeinzierl commented Sep 22, 2023

MarionBWeinzierl commented Sep 22, 2023

MarionBWeinzierl commented Sep 22, 2023

MarionBWeinzierl commented Sep 28, 2023

MarionBWeinzierl commented Sep 29, 2023 • edited Loading

MarionBWeinzierl commented Aug 14, 2023 •

edited

Loading

MarionBWeinzierl commented Aug 18, 2023 •

edited

Loading

MarionBWeinzierl commented Sep 29, 2023 •

edited

Loading