-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calling a pipeline without specifying a method should use call() #8130
Comments
Comment by Melanie Clarke on JIRA: We recently discussed this on the NIRSpec team as well. Most of our internal users were surprised to hear that the .run method is not advised, since many demonstration notebooks use that syntax. It is also the more natural syntax – most users preferred setting parameters in a Step or Pipeline object via attributes to setting them in a dictionary to provide to the .call method. It was very surprising to users both that the run method works without warning, and that the call method quietly ignores any parameters previously set in Step or Pipeline attributes. I think this is a design flaw in the pipeline infrastructure that can't be addressed with documentation. It is already documented in several places that .call should be used in place of .run, but having both available continues to confuse users and cause subtle problems with data reductions. I can think of several possible approaches to fix this:
I think the last approach is preferable, since it would be least disruptive and least surprising to users. |
There is a way to do the last approach. David Law's MIRI MRS notebook gets the CRDS information, and then instantiates the pipeline, sets the parameters as attributes, and then calls the run method. He has this wrapped up in a function called rundet1().
|
While the get_config_from_reference function works for our own notebooks, I'd be interested to see if there's a way to make Melanie Clarke 's third approach work more generally. Is it possible and/or desirable for both .call() and .run() to check for parameter defaults automatically? |
Comment by Howard Bushouse on JIRA: Theoretically pretty much anything is possible, e.g. updating the .run() method to look for pars ref files, but I think that would then be defeating the original purpose of having 2 different methods in the first place and would essentially make them redundant with one another. The .run method is preferable when you want to execute something multiple times using the same param settings, while .call starts over from scratch each time it's called. IMHO, it would be nice to simply update .call so that you can give it param/args as easily as you can with .run (i.e. not having to create the cumbersome dictionary structure). Sounds like a good topic for discussion in a DMS WG meeting. |
Adding a note from the July 10 JP meeting that the cleanest solution may be to deprecate usage that doesn't specify either .run() or .call(), adding a deprecation note now and actually removing that functionality some time next year. Potentially add a warning note for .run() as well. |
Followup discussion at the Sep 3 JP meeting; we should print a deprecation warning now (https://jira.stsci.edu/browse/JP-3728) that in a future TBD build usage without specifying either .call() or .run() will be deprecated. |
Comment by Melanie Clarke on JIRA: Just to close the loop here... I know that the current request is just to deprecate the call function, so that you have to choose either call or run, but in case this comes back around again, we did explore a couple other alternative solutions to make the run method easier or clearer to use.
This PR would warn the user if run was called without retrieving CRDS defaults:
This PR would allow the run method to check CRDS for default values for any parameters not directly set by the user, to make it behave more call: |
Issue JP-3490 was created on JIRA by Bryan Hilbert:
Currently, when calling a pipeline (or step?) without specifying a method e.g.:
from jwst.pipeline.calwebb_image2 import Image2Pipeline
p = Image2Pipeline('my_file_asn.json')
the pipeline will run using the run() method. Given that the run() method is not the preferred way to run the pipeline, it might be better/safer to have the pipeline in this situation default to using the call() method instead.
I'm not sure if the situation is the same for individual steps, but the same logic applies there I think.
The text was updated successfully, but these errors were encountered: