Skip to content

Step methods

Ed Slavich edited this page Jan 19, 2021 · 6 revisions

Inventory of methods for creating and running steps

Creating steps

Column key:

  • Selects subclass: Does the method attempt to select the correct Step subclass based on arguments or config file contents?
  • Runs step: Does the method run the step before returning?
  • CRDS pars: Does the method incorporate parameters from CRDS pars files?
  • Config input: Does the method accept a path to a user config?
  • Override parameters: Does the method support overriding parameters on an individual basis?
  • Override style: If parameter overrides are supported, are they passed as standard keyword arguments or CLI-style arguments?
Method Selects subclass Runs step CRDS pars Config input Override parameters Override style Notes
__init__ keyword Accepts a config_file argument but does not apply parameters from it.
call keyword User config file passed as keyword argument. Config file's class field ignored.
from_cmdline CLI Selects Step subclass based on config class field or class name argument.
from_config_file Selects Step subclass based on config class field.
from_config_section Probably not intended to be part of the public API.

Running steps

Column key:

  • Creates step: Does the method create the Step instance before running it?
Method Creates step Notes
__call__ Alias for run.
call Python API only (not used by CLI code).
from_cmdline The strun script is a thin wrapper around this method.
process Subclass implementation method. Not intended to be called directly by general users.
run Eventually called by any method that needs to run the step.

Suggestions for improvement

  • Eliminate run-and-call methods. Little value add, and presents confusing interface where step creation arguments and step run arguments are blended together into one method signature.
  • Move methods that return instances whose classes may be different from the one the method invoked on. This is confusing and better handled with module methods.
  • Remove from_cmdline and instead call the corresponding cmdline module method directly.
  • Rename process to make clear that it shouldn't be invoked by users. Maybe a name with a leading underscore, or something like run_impl.
  • Remove one of run or __call__ so that usage is uniform.
  • Replace config_file argument to __init__ to something like working_dir.
  • Change CLI code to be a relatively thin wrapper around the Python interface (instead of parallel implementations like the current from_cmdline vs call). This will ensure consistency between the two interfaces. There is already some divergence between call and from_cmdline, e.g. the _pars_model atttribute is not set by call, and call doesn't know how to select the Step subclass based on a config.
  • Pass around step parameters as a separate dict argument instead of **kwargs. This provides a clear separation between the parameters and other method arguments.

Possible new interface

  • Step.__init__(self, params=None, working_dir=None, ...): Parameters are passed to initializer in a dict.
  • Step.call_impl(self, *args): Step subclass implementation.
  • Step.__call__(self, *args): Wrapper around call_impl that handles common setup and teardown.
  • stpipe.create_step(*, step_class=None, config_path=None, crds_params_enabled=True, dataset=None, params=None, working_dir=None, ...): Convenience method for creating steps. At least one of step_class or config_path is required to determine the step class. dataset is required if crds_params_enabled is True.
  • stpipe.cmdline.from_cmdline(args): Method that parses CLI arguments. Ends in a call to stpipe.create_step.

Run step from CLI

from stpipe.cmdline import from_cmdline

# stpipe step run config.cfg dataset.asdf --foo=42
step, inputs = from_cmdline(args)
step(*inputs)

Run step from Python

from stpipe import create_step

step = create_step(config_path="config.cfg", dataset="dataset.asdf", params={"foo": 42})
step("dataset.asdf")

Developing a step

from stpipe import Step

class MyStep(Step):
    def call_impl(self, dataset):
        print(f"Value of foo: {self.foo}")

step = MyStep(params={"foo": 42})
step("dataset.asdf")
Clone this wiki locally