-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stages are doing significant work before they are executed #7
Comments
For BPZ this is because |
Re: pzflow, I think this is actually fixed in LSSTDESC/RAIL#201 -- what's in the main branch now was a workaround for not having a way to use RAIL to make that model file. The new version of the creator (sampler) stage ingests the model in its |
The BPZ issue was handled in a rail_bpz PR, John Franklin will make a similar PR here in RAIL to move a few items in the creation Modeler and FlowModeler from the init to run functions. |
I believe the flow is being loaded into the DataStore because of these lines in the self.model = None
if not isinstance(args, dict):
args = vars(args)
self.open_model(**args) This code in the init of Maybe we want to move opening the models to the run methods? @eacharles would that be okay? I'm not sure what role the |
1. I guess the question is if ‘open_model’ is doing actual work, or just opening a file. If the former then yes, we could move it to run()
2. The point of args is that different models mike require different information to open, e.g., maybe just a filename, but maybe more information.
…-e
On Sep 12, 2022, at 11:41 AM, John Franklin Crenshaw ***@***.***> wrote:
I believe the flow is being loaded into the DataStore because of these lines in the __init__ of creation.engine.Creator:
self.model = None
if not isinstance(args, dict):
args = vars(args)
self.open_model(**args)
This code in in the init of creation.engine.Creator, creation.engine.PosteriorCalculator, estimation.estimator.CatEstimator, estimation.summarizer.SZPZSummarizer.
Maybe we want to move opening the models to the run methods? @eacharles <https://github.com/eacharles> would that be okay? I'm not sure what role the args are playing here - maybe you have some insight?
—
Reply to this email directly, view it on GitHub <#7>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADRIGIQGJYMYAGRSVVNN453V552NHANCNFSM6AAAAAAQBLA3MM>.
You are receiving this because you were mentioned.
|
Maybe we want to leave the checking that the models exists in init |
It's not wise to do things like that in the init stage, even opening a file, because stages are often instantiated before their inputs are even generated by earlier steps in the pipeline, when we are creating and preparing the pipeline in the first place. |
@joezuntz I think what we wanted was to have RAIL check if your outputs already exist before running the pipeline, and refuse to overwrite those files unless explicitly instructed to do so. So we want it to fail fast. Do you think that is okay to have in the init? |
That's interesting - is this just for specific files, or is it a pipeline-wide setting that no files should be overwritten? |
I think we wanted it for all files. This is because we plan on having a central location for all of our RAIL experiments on NERSC, and want to build in some protection of accidentally overwriting past experiments. We will have an OVERWRITE flag that will allow you to overwrite files if you want to |
That sounds like a feature we should have upstream in ceci I think. I'll think about where it could go. It would definitely make sense in the pipeline section but then it wouldn't apply if you manually ran a stage. |
I'm going to audit all of the rail subpackages that have subclasses of the RAIL stages looking for any that are doing substantial work in the |
The only stage that I found that seemed like it might be doing extraneous work in the Created issue: LSSTDESC/rail_pzflow#8 to track it. |
For those of you following this issue, given that we've audited the subclasses and created follow up issues as needed. Can we close this issue out now? If there is still work to be done, or thoughts to be had on this one, I would recommend closing it anyway, and opening a new issue to track the new work. |
I think I'd like to keep it open just until the subclass issues are resolved, if that's okay. |
BPZ, FlowEngine, and possibly other stages are currently doing significant work when they are instantiated, instead of waiting for when they are run.
FlowEngine does this:
BPZ does this:
This only happens the first time, but it would still be nice to move it later or include the data files in the package, maybe. The directory they are currently being written to is the one I mentioned just now in LSSTDESC/RAIL#238
The text was updated successfully, but these errors were encountered: