Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RunInference.with_errors() API #14

Draft
wants to merge 21 commits into
base: master
Choose a base branch
from

Conversation

hgarrereyn
Copy link

with_errors() API

This PR introduces the RunInference(...).with_errors() API which allows users to catch runtime errors as a separate PCollection stream.

By default, runtime errors (for example invalid model specs or invalid examples) are thrown which can crash a pipeline. If this is not desirable, users can use .with_errors() to catch runtime errors:

inference_result = examples | RunInference(inference_spec).with_errors()
inference_result['errors'] | LogToFile(...)
inference_result['predictions'] | ...

The error output stream has the type: Tuple[Exception, Any] and contains both the original error and whatever object is relevant to the error.

Note: when runtime errors are allowed to be raised, they are raised from their original location (e.g. inside a nested PTransform) which makes debugging easier.

Internal details

  • RunInferenceImpl is now a class (rather than a function with @beam.ptransform_fn). This enables us to add a with_errors method that can take effect in the expand method.

  • RunInferenceCore returns a dict containing {'predictions': ..., 'errors': ...} and takes an additional catch_errors: bool = False parameter which indicates whether to catch or allow runtime errors.

  • Added the _ParDoExceptionWrapper utility which runs beam.ParDo on a provided beam.DoFn and optionally catches exceptions raised in the process() method.

  • Operation wrapper transforms (e.g. _Classify, _Regress, ...) accept an additional catch_errors: bool = False parameter and return a dict containing {'predictions': ..., 'errors': ...}

Dependencies

This PR depends on #13

@hgarrereyn
Copy link
Author

@rose-rong-liu @SherylLuo

return beam.pvalue.TaggedOutput(_get_operation_type(batch[0]), batch)
else:
try:
return beam.pvalue.TaggedOutput(_get_operation_type(batch[0]), batch)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this throw error? It seems _get_operation_type will return str or unicode

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_get_operation_type has return value Text (str/unicode) and beam.pvalue.TaggedOutput accepts either str or unicode for a tag. See: https://github.com/apache/beam/blob/master/sdks/python/apache_beam/pvalue.py#L328-L343

hgarrereyn added 13 commits August 6, 2020 01:01
Benchmarks showed that TagByOperation was a performance bottleneck* as it
requires disc access per query batch. To mitigate this I implemented
operation caching inside the DoFn. For readability, I also renamed this
operation to "SplitByOperation" as that more accurately describes its
purpose.

On a dataset with 1m examples, TagByOperation took ~25% of the total wall
time. After implementing caching, this was reduced to ~2%.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants