-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove/refactor root fv3net source tree #567
Conversation
Do fv3fit tests pass with tensorflow==2.0 with the normalization PR merged? I was having issues getting the model loader to make use of the custom objects keyword argument in that version, so I increased the requirement. |
That PR was merged before I started this, and all the tests passed, so I think so... |
@mcgibbon It appears so
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I do have a few questions. I also think we should consider only running the dataflow test on master/on approval (or make it faster).
@@ -0,0 +1,18 @@ | |||
name: dataflow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this file being used by anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like it to be, but not right now unforunately. It does work for setting up a local development environment for the dataflow jobs.
@@ -81,7 +81,7 @@ test_regression: | |||
coverage run -m pytest -vv -m regression -s | |||
|
|||
test_dataflow: | |||
coverage run -m pytest -vv tests/dataflow/ -s | |||
coverage run -m pytest -vv workflows/dataflow/tests/integration -s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any ideas on how we could get this test to take less than 9 minutes? Is that just unavoidable given the slowness of spinning up dataflow workers?
Or maybe we could move this test behind a hold/just run on master? It is something that seems unlikely to be broken by the vast majority of our PRs and is slow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any ideas on how we could get this test to take less than 9 minutes?
The num_codecs people could finally push wheels to pypi: zarr-developers/numcodecs#224. I'm not sure what the holdup is.
Or maybe we could move this test behind a hold/just run on master?
This is currently true. I triggered this manually.
COPY workflows $FV3NET/workflows | ||
COPY catalog.yml dataflow.sh $FV3NET/ | ||
COPY catalog.yml $FV3NET/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't want to copy in the script from its new location?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is copied along with everything else in workflows
in the line above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just a couple of visibility suggestions that maybe worthwhile. Thanks for the help on how we're currently submitting jobs.
@@ -156,22 +156,6 @@ should be set to 'gcp_key'. Additional arguments are | |||
available for configuring the kubernetes job and documented in the `run_kubernetes` | |||
docstring. | |||
|
|||
## Adding new model types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should remove (or update) the project organization since it's wildly out of date.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also just consider copy pasting the dataflow.sh
usage in place of the current dataflow usage section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved this section to workflows/dataflow
and now point the user to call ./dataflow.sh -h
to see the help. Any copy/pasted usage will likely become out of date, so I don't think it makes sense to do that here.
The only current code in the fv3net source tree is for the dataflow jobs. This PR moves these to a new workflow directory along with the
setup.py
and the dataflow submission scriptdataflow.sh
. Subsequent PRs can further isolate the build/test environment.Refactored public API:
dataflow.sh
moved toworkflows/dataflow/dataflow.sh
extract_tars
workflowfv3net/models
code.Significant internal changes:
fv3net.pipelines
.Requirement changes:
tensorflow=2
in the conda environment file. There is no Mac OS tensorflow v2.2 conda package.