Streaming Model API #13

hgarrereyn · 2020-08-04T00:56:32Z

Streaming Model API

This PR introduces a streaming model API for RunInference.

Overview

In addition to the current api:

predictions = examples | RunInference(inference_spec_type)

It is now possible to use a PCollection of inference_spec_type as a side input or run inference on queries directly:

models: PCollection[InferenceSpecType]
predictions = examples | RunInference(models)

queries: PCollection[Tuple[InferenceSpecType, Example]]
predictions = queries | RunInference()

Notes

This PR depends on #11 and #12

This PR introduces beam.GroupIntoBatches and _TemporalJoin which both require stateful DoFn support. This is not supported in Dataflow v1 and currently RunInference is broken in Dataflow v2 due to BEAM-2717

hgarrereyn · 2020-08-04T00:56:52Z

@rose-rong-liu @SherylLuo

Benchmarks showed that TagByOperation was a performance bottleneck* as it requires disc access per query batch. To mitigate this I implemented operation caching inside the DoFn. For readability, I also renamed this operation to "SplitByOperation" as that more accurately describes its purpose. On a dataset with 1m examples, TagByOperation took ~25% of the total wall time. After implementing caching, this was reduced to ~2%.

hgarrereyn added 7 commits July 29, 2020 19:19

add _RunInferenceCore

0550177

add tests for _BatchQueries and _RunInferenceCore

1df33ac

Merge remote-tracking branch 'upstream/master' into core

be1b8e8

add shared tag

5d61a61

Merge branch 'shared_tag' into core2

a90087a

add streaming model apis

3906502

remove todo

22e4664

googlebot added the cla: yes label Aug 4, 2020

hgarrereyn mentioned this pull request Aug 6, 2020

Add RunInference.with_errors() API #14

Draft

hgarrereyn added 2 commits August 6, 2020 01:01

misc formatting and renaming

a88ecc1

simplify _BatchQueries, remove redundant test

1c167ad

hgarrereyn mentioned this pull request Aug 6, 2020

Switch to RunInferenceCore #11

Open

hgarrereyn added 10 commits August 6, 2020 21:50

misc formatting

190643e

reorder imports, add cache explanation

76d0a0c

Merge branch 'core' into core2

a9fe62d

Merge remote-tracking branch 'upstream/master' into core2

62c40d4

misc documentation and validation

b3ce5f7

formatting

0c09dca

update metric tracking

4bfd135

Merge branch 'core' into core2

e8c780a

add stateful dofn check

c659d04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming Model API #13

Streaming Model API #13

hgarrereyn commented Aug 4, 2020

hgarrereyn commented Aug 4, 2020

Streaming Model API #13

Are you sure you want to change the base?

Streaming Model API #13

Conversation

hgarrereyn commented Aug 4, 2020

Streaming Model API

Overview

Notes

hgarrereyn commented Aug 4, 2020