Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove/refactor root fv3net source tree #567

Merged
merged 21 commits into from
Aug 14, 2020

Conversation

nbren12
Copy link
Contributor

@nbren12 nbren12 commented Aug 14, 2020

The only current code in the fv3net source tree is for the dataflow jobs. This PR moves these to a new workflow directory along with the setup.py and the dataflow submission script dataflow.sh. Subsequent PRs can further isolate the build/test environment.

Refactored public API:

  • The submission script is dataflow.sh moved to workflows/dataflow/dataflow.sh
  • Delete extract_tars workflow
  • Delete out of date fv3net/models code.

Significant internal changes:

  • Use a namespaced package for dataflow jobs. When installed, these can still be imported at the old path fv3net.pipelines.

Requirement changes:

  • Relax tensorflow version to tensorflow=2 in the conda environment file. There is no Mac OS tensorflow v2.2 conda package.

@nbren12 nbren12 marked this pull request as ready for review August 14, 2020 16:48
@nbren12 nbren12 changed the title Feature/dataflow to workflow Remove/refactor root fv3net source tree Aug 14, 2020
@mcgibbon
Copy link
Contributor

Do fv3fit tests pass with tensorflow==2.0 with the normalization PR merged? I was having issues getting the model loader to make use of the custom objects keyword argument in that version, so I increased the requirement.

@nbren12
Copy link
Contributor Author

nbren12 commented Aug 14, 2020

That PR was merged before I started this, and all the tests passed, so I think so...

@frodre
Copy link
Contributor

frodre commented Aug 14, 2020

@mcgibbon It appears so
https://app.circleci.com/pipelines/github/VulcanClimateModeling/fv3net/4561/workflows/94da5409-fb05-4478-a355-02e4ded3b986/jobs/15624

external/fv3fit/tests/test_loss.py ..........                            [  4%]
external/fv3fit/tests/test_normalization.py ..................           [  7%]
external/fv3fit/tests/test_packer.py ............                        [  9%]
external/fv3fit/tests/test_regression_sklearn_train.py .                 [  9%]
external/fv3fit/tests/test_sklearn_adapters.py .......                   [ 10%]
external/fv3fit/tests/test_sklearn_wrapper.py ....                       [ 10%]

Copy link
Contributor

@oliverwm1 oliverwm1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I do have a few questions. I also think we should consider only running the dataflow test on master/on approval (or make it faster).

@@ -0,0 +1,18 @@
name: dataflow
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this file being used by anything?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like it to be, but not right now unforunately. It does work for setting up a local development environment for the dataflow jobs.

@@ -81,7 +81,7 @@ test_regression:
coverage run -m pytest -vv -m regression -s

test_dataflow:
coverage run -m pytest -vv tests/dataflow/ -s
coverage run -m pytest -vv workflows/dataflow/tests/integration -s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any ideas on how we could get this test to take less than 9 minutes? Is that just unavoidable given the slowness of spinning up dataflow workers?

Or maybe we could move this test behind a hold/just run on master? It is something that seems unlikely to be broken by the vast majority of our PRs and is slow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any ideas on how we could get this test to take less than 9 minutes?

The num_codecs people could finally push wheels to pypi: zarr-developers/numcodecs#224. I'm not sure what the holdup is.

Or maybe we could move this test behind a hold/just run on master?

This is currently true. I triggered this manually.

COPY workflows $FV3NET/workflows
COPY catalog.yml dataflow.sh $FV3NET/
COPY catalog.yml $FV3NET/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't want to copy in the script from its new location?

Copy link
Contributor Author

@nbren12 nbren12 Aug 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is copied along with everything else in workflows in the line above.

Copy link
Contributor

@frodre frodre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just a couple of visibility suggestions that maybe worthwhile. Thanks for the help on how we're currently submitting jobs.

@@ -156,22 +156,6 @@ should be set to 'gcp_key'. Additional arguments are
available for configuring the kubernetes job and documented in the `run_kubernetes`
docstring.

## Adding new model types
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should remove (or update) the project organization since it's wildly out of date.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also just consider copy pasting the dataflow.sh usage in place of the current dataflow usage section

Copy link
Contributor Author

@nbren12 nbren12 Aug 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this section to workflows/dataflow and now point the user to call ./dataflow.sh -h to see the help. Any copy/pasted usage will likely become out of date, so I don't think it makes sense to do that here.

@nbren12 nbren12 added the automerge The automerge github action will merge PRs with this label once they are ready. label Aug 14, 2020
@github-actions github-actions bot merged commit fdfc0d3 into master Aug 14, 2020
@github-actions github-actions bot deleted the feature/dataflow-to-workflow branch August 14, 2020 22:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automerge The automerge github action will merge PRs with this label once they are ready.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants