Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calling a pipeline without specifying a method should use call() #8130

Open
stscijgbot-jp opened this issue Dec 14, 2023 · 7 comments
Open

Comments

@stscijgbot-jp
Copy link
Collaborator

Issue JP-3490 was created on JIRA by Bryan Hilbert:

Currently, when calling a pipeline (or step?) without specifying a method e.g.:

from jwst.pipeline.calwebb_image2 import Image2Pipeline
p = Image2Pipeline('my_file_asn.json')

the pipeline will run using the run() method. Given that the run() method is not the preferred way to run the pipeline, it might be better/safer to have the pipeline in this situation default to using the call() method instead.

I'm not sure if the situation is the same for individual steps, but the same logic applies there I think.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Melanie Clarke on JIRA:

We recently discussed this on the NIRSpec team as well.  Most of our internal users were surprised to hear that the .run method is not advised, since many demonstration notebooks use that syntax.  It is also the more natural syntax – most users preferred setting parameters in a Step or Pipeline object via attributes to setting them in a dictionary to provide to the .call method.  It was very surprising to users both that the run method works without warning, and that the call method quietly ignores any parameters previously set in Step or Pipeline attributes.

I think this is a design flaw in the pipeline infrastructure that can't be addressed with documentation.  It is already documented in several places that .call should be used in place of .run, but having both available continues to confuse users and cause subtle problems with data reductions.

I can think of several possible approaches to fix this:

  • add a user warning to the .run method when it is called directly
  • remove the .run method from the public API entirely
  • find a way for the .run method to check CRDS for parameter defaults, perhaps on object instantiation

I think the last approach is preferable, since it would be least disruptive and least surprising to users.

@bhilbert4
Copy link
Collaborator

There is a way to do the last approach. David Law's MIRI MRS notebook gets the CRDS information, and then instantiates the pipeline, sets the parameters as attributes, and then calls the run method. He has this wrapped up in a function called rundet1().

    crds_config = Detector1Pipeline.get_config_from_reference(filename)
    det1 = Detector1Pipeline.from_config_section(crds_config)
    
    det1.output_dir = outdir # Specify where the output should go
    
    det1.jump.maximum_cores='half'
    det1.ramp_fit.maximum_cores='half'
    # This next parameter helps with very bright objects and/or very short ramps
    det1.jump.three_group_rejection_threshold=100
    
    # Enable detection of large cosmic ray showers (currently only works for FASTR1 data)
    det1.jump.find_showers=True
        
    det1.save_results = True # Save the final resulting _rate.fits files
    det1(filename) # Run the pipeline on an input list of files

@stscijgbot-jp
Copy link
Collaborator Author

Comment by David Law on JIRA:

While the get_config_from_reference function works for our own notebooks, I'd be interested to see if there's a way to make Melanie Clarke 's third approach work more generally.  Is it possible and/or desirable for both .call() and .run() to check for parameter defaults automatically?

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Howard Bushouse on JIRA:

Theoretically pretty much anything is possible, e.g. updating the .run() method to look for pars ref files, but I think that would then be defeating the original purpose of having 2 different methods in the first place and would essentially make them redundant with one another. The .run method is preferable when you want to execute something multiple times using the same param settings, while .call starts over from scratch each time it's called. IMHO, it would be nice to simply update .call so that you can give it param/args as easily as you can with .run (i.e. not having to create the cumbersome dictionary structure).

Sounds like a good topic for discussion in a DMS WG meeting.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by David Law on JIRA:

Adding a note from the July 10 JP meeting that the cleanest solution may be to deprecate usage that doesn't specify either .run() or .call(), adding a deprecation note now and actually removing that functionality some time next year.  Potentially add a warning note for .run() as well.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by David Law on JIRA:

Followup discussion at the Sep 3 JP meeting; we should print a deprecation warning now (https://jira.stsci.edu/browse/JP-3728) that in a future TBD build usage without specifying either .call() or .run() will be deprecated.

@stscijgbot-jp
Copy link
Collaborator Author

stscijgbot-jp commented Dec 5, 2024

Comment by Melanie Clarke on JIRA:

Just to close the loop here...  I know that the current request is just to deprecate the call function, so that you have to choose either call or run, but in case this comes back around again, we did explore a couple other alternative solutions to make the run method easier or clearer to use.

 

This PR would warn the user if run was called without retrieving CRDS defaults:

spacetelescope/stpipe#198 

 

This PR would allow the run method to check CRDS for default values for any parameters not directly set by the user, to make it behave more call:

spacetelescope/stpipe#192

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants