Skip to content
This repository has been archived by the owner on Mar 1, 2024. It is now read-only.

SingleEntityPipeline (v0.0.18-beta5) #69

Merged
merged 8 commits into from
Sep 19, 2023

Conversation

wintonzheng
Copy link
Contributor

@wintonzheng wintonzheng commented Sep 17, 2023

In order to make all the singular entity inference pipeline easier to build:

These components are introduced:

  • SingleEntityBusinessLogicComponent

  • SingleEntityBusinessLogicPipeline

  • SingleEntityModelComponent

  • SingleEntityPipelineComponent

  • Does this PR have impact on local development experience? If yes, make sure you have a plan and add the documentations to address issues that come with the change

  • bump version

  • make a release

  • publish to pypi service

@wintonzheng wintonzheng changed the base branch from main to v0.0.18-beta September 17, 2023 05:11
@wintonzheng wintonzheng force-pushed the shu/singular_pipeline branch 2 times, most recently from e87252c to a371f22 Compare September 17, 2023 07:08
@@ -204,3 +217,8 @@ async def inference(
data=output_data,
model_name=self.name,
)


class SingularModelComponent(BaseModelComponent[SINGULAR_MODEL_INPUT, MODEL_OUTPUT]):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming: InferenceModelCompoent?

@wintonzheng
Copy link
Contributor Author

first round of feedbacks:

  • naming -> SingleEntityXXX
  • no need to define SingularPipelineRequest as it introduces cumbersome API request schema
  • get rid of the SingularModelInput.
  • single entity business logic request takes 1. request 2. mode_output - no more entity or scored entity

return input.request.request_id


ModelComponent = MultiEntityModelComponent
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To support backward compatible for now

@@ -35,8 +37,8 @@ class BusinessLogicEventData(EntityEventData):

business_logic_pipeline_order: int
business_logic_name: str
old_score: float
new_score: float
old_score: str
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

model scores/output should be logged as string

@wintonzheng wintonzheng changed the title shu/singular pipeline SingleEntityPipeline Sep 19, 2023
)


fraud_pipeline = FraudPipeline(model=fraud_model, business_logic=fraud_biz_pipeline)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the "new experience" of passing the model and business_logic to the pipeline to define the pipeline.

Does it look simpler than the current RankingPipeline.get_model pattern?

def get_model(self) -> ModelCompolent:
    return SomeModelComponent()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ykeremy i think the model=xxx pattern definitely makes it easier to play around with different version of models in the future for a pipeline. i prefer model=xxx pattern. wdyt?

@wintonzheng wintonzheng force-pushed the shu/singular_pipeline branch 2 times, most recently from 1f6d5f0 to a45781a Compare September 19, 2023 03:24
@wintonzheng
Copy link
Contributor Author

I also tested against Joao's credit limit PR: https://github.com/inventa-shop/wyvern/pull/219

@wintonzheng wintonzheng changed the title SingleEntityPipeline SingleEntityPipeline (v0.0.18-beta5) Sep 19, 2023
"""

request: REQUEST_ENTITY
adjusted_model_output: MODEL_OUTPUT_DATA_TYPE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we should call this adjusted model output. If it's boosting/deboosting etc. yes it's adjusted model output but what if they don't have any models? adjusted_score feels more accurate to me since we also use old_score and new_score within the event data. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my problem with score is that it indicates it's a number(int, float). but actually this "model output" could be a list of float or a dict[str, AnySerializable].

what about adjusted_output?

candidate: The candidate that the business logic layer is being asked to perform business logic on
"""

identifier: Identifier
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only need this for logging purposes. Let's try to think of a logging component / request design so that we know the main entity for the request and they won't need to pass in the identifier.

Not urgent though, just to keep in mind.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's pretty challenging to get rid of this identifier rn.

The problem here is how do we know what THE entity is for this pipeline, model and business logic.

The "logging part" has to either happen within the business logic pipeline, or the business logic pipeline has to output all the information to log outside of the biz pipeline.

The way we do it in this PR is by passing the identifier / entity through the SingleEntityBusinessLogicRequest.

wyvern/exceptions.py Show resolved Hide resolved
wyvern/components/single_entity_pipeline.py Show resolved Hide resolved
@wintonzheng wintonzheng merged commit 5d885da into v0.0.18-beta Sep 19, 2023
2 checks passed
@wintonzheng wintonzheng deleted the shu/singular_pipeline branch September 19, 2023 20:03
wintonzheng added a commit that referenced this pull request Sep 26, 2023
* chained model evaluation (#58)

* multi model evaluation

* fix test

* chained model evaluation

* cache model output

* v bump

* support dict model output

* rename

* add get_model_output support

* fix

* 0.0.18-beta1

* shu/more 0.0.18 fix (#62)

* ChainedModelInput should be in wyvern.__init__

* doc for target in ModelEventData

* docstring for get_model_output

* beta2

* by default we should not cache model output

* cache output for modelbit component (#64)

* 0.0.18-beta3

* shu/do not enforce features in model component (#67)

* return empty set for model component manifest_feature_names

* version beta4

* target -> model_key (#70)

* SingleEntityPipeline (v0.0.18-beta5) (#69)

* introduce singular model component and singular pipeline

* SingularBusinessLogicPipeline

* integrate with business logic in SingularPipelineComponent

* update to SingleEntity style

* update to SingleEntity style

* single entity pipeline

* update generic type order for business logic components

* SingleEntityModelbitComponent

* address feedbacks for single entity pipeline (#72)

* address feedbacks for single entity pipeline

* feature store feature available in realtime feature

* v0.0.18-beta6

* Fix how we filter real time features before feature retrieval (#66)

* Fix how we filter real time features before feature retrieval

* Bump version to 0.0.18-beta7

* v0.0.18b7 - support any dictionary type

* v0.0.18

---------

Co-authored-by: Kerem Yilmaz <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants