Skip to content
This repository has been archived by the owner on Mar 1, 2024. It is now read-only.

Latest commit

 

History

History
73 lines (49 loc) · 3.76 KB

data_hydration.mdx

File metadata and controls

73 lines (49 loc) · 3.76 KB
title description icon
Data Hydration
droplet

Data hydration refers to the process of filling an object with data. When Wyvern gets a request, it automatically detects the entities and primary keys within the request, grabs data for all the entities that Wyvern has indexed and hydrates the entities.

Wyvern Entity

A Wyvern entity (represented by the WyvernEntity in code) is the basic unit of data structure that represents a critical object in your pipline. It's like the database table concept. For example in the case of product ranking, "product", "user", "query" or even the "request" are all important entities.

Wyvern Entity is built with pydantic model so it's really easy to work with. Let's look at the ranking product entity:

from wyvern import ProductEntity


class Product(ProductEntity):
    brand: str | None = None
    category: str | None = None
    categories: list[str] = []
    title: str | None = None
    description: str | None = None
    price: float | None = None
    number_in_stock: int | None = None
    search_score: float | None = None

Wyvern provides a predefined ProductEntity, which is a child class of WyvernEntity and has product_id as it's main identifier.

Wyvern Index

Wyvern Index (current only integrates with Redis) is where you store Near Realtime or Streaming Features of entity data. The concept is quite similar to the elasticsearch index.

Wyvern Application has the following indexation APIs:

  • Upload Entities (POST /api/v1/entities/upload): API to create or update entities. No partial update of an entity.In another word, if an entity is already indexed (already exists) in Wyvern Index, it will be completely replaced by this new entity data with the same entity id.
  • Get Entities (POST /api/v1/entities/get): API to read the entities from Wyvern Index.
  • Delete Entities (POST /api/entity/v1/entities/delete): API to delete entities in Wyvern Index.

Data Hydration

At the beginning of a wyvern inference request, Wyvern grabs all the entities contained in the request schema, fetches all entity data from Wyvern Index if the entity exists and "hydrate" all the data into the entity object. Therefore, even if your API request is only sending the primary key of the entity, the pipeline code will still have full access to all the indexed data attributes of the entity. This is the what the "Entity Hydration" phase.

Let's look at an example with the ranking request:

from wyvern import ProductEntity


class Product(ProductEntity):
    brand: str | None = None
    category: str | None = None
    categories: list[str] = []
    title: str | None = None
    description: str | None = None
    price: float | None = None
    number_in_stock: int | None = None
    search_score: float | None = None


class MyRankingRequest(RankingRequest[Product]):
    candidates: list[Product]

Let's assume Wyvern receives a request object with a list of product candidates that only has product_id, Wyvern will fetch all the product entities that are found in Wyvern Index and hydrate data into the Product entity. As a result, request.candidates[0].brand will be available within your pipeline code.

Batch Feature Retrieval

Wyvern automates the retrieval of all the features required by your pipeline from feature store and store the features in memory per request. Inside every wyvern component, you can do self.get_feature(identifier, "feature_name") to get the feature value.

Next, let's see how batch features work in Wyvern.