Skip to content
This repository has been archived by the owner on Mar 1, 2024. It is now read-only.

Commit

Permalink
what and why
Browse files Browse the repository at this point in the history
  • Loading branch information
wintonzheng committed Sep 13, 2023
1 parent 224541f commit 395889f
Show file tree
Hide file tree
Showing 2 changed files with 101 additions and 1 deletion.
102 changes: 101 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,107 @@
<a href="https://github.com/Wyvern-AI/wyvern/blob/main/LICENSE"><img src="https://badgen.net/badge/License/Elv2/green?icon=github"/></a>
</div>

## Check Out [Wyvern's Official Doc](https://docs.wyvern.ai)
## What is Wyvern?

Wyvern is a real-time machine learning platform for marketplaces:

- **Search and Discovery**: Wyvern specializes in bringing use cases like **recommendations and rankings** in-house.
- **Empower the Data Team**: Wyvern is tailored for your data team to independently build and deploy production-grade machine learning pipelines for the e-commerce and marketplace industry, reducing the engineering involvement in the entire process.
- **Orchestration for ML Pipelines**: Wyvern is agnostic to the solutions your pick for your feature sotre, model serving solution, or data warehouse. It automates the work of retrieving data from the feature store and passing data the model service, as well as logging all the events. It abstracts all the engineering work above away from users, so that they can just focus on defining the request/response schemas of the API, the model, the features the model depends on, the business logic after the model, and more importantly training the models with the feedback data generated by the ML pipeline

### Wyvern Architecture

![Wyvern Architecture](/docs/wyvern_architecture.png)

Overall, Wyvern gives you a framework to quickly define your real-time ML pipeline. There are a couple of important components as you can see in the architecture:

1. **Retrieval**: Wyvern can connect to and retrieve objects from your search index.
2. **Feature Module**: Wyvern has the built-in support for [feast](https://feast.dev/), an open source feature store. It also supports connecting to the feature store that you would like to use. Moreover, Wyvern provides interfaces for you to define your [batch features](/batch_feature) and [real-time features](/realtime_feature) easily, with the support of feature grouping, feature sharing, features for composite entities and request based features.
3. **Model Module**: Wyvern provides [the model interface](/model_service#define-the-whole-model) that allows you to define your own model in place or call your model service. It provides an interface to define features that your model depends on easily.
4. **Business Logic**: Wyvern makes defining your business logic easy after the model inference. For example, if your want to promote a specific brand of tshirt and move it to the top of the ranking result for the "tshirt" query.
5. **Event Logging**: All the events in your ML application, including real-time feature logging, model logging, business logic logging, impression logging, as well logging your own custom events, are being automatically logged by Wyvern and data will be piped to your data warehouse. Refer to [Logging Events](/logging_events).
6. **Training Dataset**: Wyvern provides the feature store serving solution (currently integrated only with feast) to serve all data of the historical batch features and the real-time features that are logged in your data warehouse.
7. **Observability**: Wyvern connets the engineering log to your favorite telemetry tool.

As Wyvern is open sourced, we will bring in more integrations with different feature stores, model serving solutions, search index for retrieval, observability tools, as well as integrations with more data warehouses.

## Why Wyvern?

### Revenue Growth and Cost Saving

Every marketplace encounters common challenges that can greatly benefit from the application of Machine Learning (ML) solutions. These challenges brings huge upsides in revenue growth and cost saving. They encompass following areas

#### Search & Discovery

In marketplaces, especially those that sell products, the core product is built to help the buyer find the right products that they're looking for, whether it's through a recommendation email, the recommended products or carousels on the home page or through search results. The creator of Wyvern, [Suchintan](https://www.linkedin.com/in/suchintansingh/), built the ML Platform at [Faire](https://www.faire.com/) and [Gopuff](https://www.gopuff.com/) to improve their Search and Discovery experience. At both places, the platform became an engine that empowered the data team to independently deliver new models to production, generating over \$100M of impact.

#### Fraud Detection

There are so many types of marketplace fruad:

- fake profile fraud
- fake product fraud
- false advertising fraud
- fake buyer fraud
- fake seller fraud
- payment fraud
- account takeover fraud
- ...
- and so many more

The total cost of ecommerce fraud was expected to be \$48 billion globally this year according to [ekata](https://ekata.com/blog/ecommerce-fraud-trends-and-statistics-merchants-need-to-know-in-2023).

Almost every growing marketplace will face fraudulent events. Being able to detect and prevent fraudulent actions would prevent them from losing millions of dallars every year. This is exactly what the other creator of Wyvern, [Shu](https://www.linkedin.com/in/shuchang-zheng-76784958/), had experienced when he worked at [Lyft](https://www.lyft.com/) and [Patreon](https://www.patreon.com/).

#### Shipping Cost Estimation

Shipping is a key part of the success of a marketplace's or for any brand's success in your marketplace. [48% of the cart abandonments are due to extra costs like shipping and taxes](https://www.printful.com/blog/ecommerce-shipping-pricing).

Marketplaces have a strong will to optimize their shipping cost strategy as a way to improve their revenue. Each marketplace has its unique shipping cost insights for which would almost be certain to benefit from customly trained machine learning models.

#### Credit Scoring

Lots of marketplaces would provides NET 30/60/90 terms as well as granting credit limit to their users. For example, [Faire has net 60 terms](https://www.faire.com/net-terms), meaning that purchases mad with net 60 terms won't be charged until 60 days later after the order.

In order to provide these terms to users in a marketplace, the marketplace usually has to do background checking to collect information before deciding to assign the terms to you, just like what banks does to approve or deny your credit card. Moreover, they also have to decide how much credit limit the user could get based on the information. Assigning too much credit limit might empower users to purchase more but could also jeopardize the business as there's a bigger risk when crediting more usage limit. Usually this process starts with manual review but it won't scale out soon once the marketplace grows fast.

Machine learning model comes into play to derisk and automate the process, saving a lot of cost here on operations as a result.

### Sharing The Cross Learnings

Lots of machine learning models for these use cases use same data or same learning. Some examples:

1. Product ranking uses session tracking to personalize results. This is also one of the factors when determining transaction fraud
2. Credit scoring is calculated based on credit score, credit limit in credit cards, a price propensity model. This is also very useful in product ranking – specifically if you’re deciding whether to show cheap or expensive handbags

With Wyvern, you would be able to leverage the shared data and share learning across multiple machine learning cases. These shared learning will furthure more generate feedbacks for the model that depends on them to keep improving model performance.

### ML Platform Is Hard To Build

From 2019 to 2022, the creators of Wyvern built two ML platforms from scratch both at Faire and Gopuff. We've also observed the wide usages of such a platform at a few others (Lyft, Patreon). We intimately understand the pain, substaintial costs and extensive efforts required to build a top-tier ML platform:

1. Feature store to store features (e.g. [feast](https://feast.dev/))
2. Model service to host ML models (e.g. [tensorflow serving](https://www.tensorflow.org/tfx/guide/serving), [bentoml](https://github.com/bentoml/BentoML))
3. Feature + Model observability
4. Orchestration for ML pipeline for specific use-cases

While being mindful of the following constraints:

1. Pipeline evaluation in < 200ms (Reference: https://iarapakis.github.io/papers/TOIS17.pdf)
2. Minimizing train / test skew

Back then, we explored multiple vendor solutions in the market for product ranking back then but couldn't find any solution that would empower the team to rapidly iterate on ML models tailored for our own business. We had so much unique business insights so having the control of the models was a no brainer. Therefore, building a ML platform in-house became the only solution.

Early in 2023, we talked to 40+ marketplaces, including top marketplaces in the world like Amazon, Etsy and Ebay, and learnt that they all face and share the challenges mentioned above.

<Frame>
<img
class="w-full h-full"
src="https://media.giphy.com/media/zyqgoGEalDVu0/giphy.gif"
/>
</Frame>

When you start building an ML platform in-house, it's like exploring a maze. We want to build Wyvern to help marketplaces get through it.

## Install Wyvern

Expand Down
Binary file added docs/wyvern_architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 395889f

Please sign in to comment.