-
Notifications
You must be signed in to change notification settings - Fork 3
Object Associations
Author: Jeff Mataya
Date: May 19, 2016
The PIM and Merchandising systems of Phoenix are built on top of a context-aware
and versioning model called ObjectForm
and ObjectShadow
. This gives us a lot
of power to craft very specific views on objects such as products, SKUs, and
categories.
While this system works well for individual entities, it breaks down when we to link entities across objects, such as when a SKU needs to be associated with a product. This document will describe how multiple objects are associated with each other and maintained across versions and contexts.
At the highest level, the following problems exist in the current data model:
- Updates to linked objects are computationally expensive.
- Storage of updates are storage expensive because we manage links with shadows.
- The table that links objects is generic, thus we sacrifice some referential integrity - we can validate in application code, but it would be nice to leverage the database for this,
On the other side, because we strictly maintain versioned associations, rolling an object graph back to a previous version is very efficient.
However, we want to optimize performance for the update scenario, as it is the operation that will most frequently occur (by multiple orders of magnitude), while still retaining our ability to correctly version the whole graph.
The rest of the document will dive into the problem and present the solution.
If you haven't read it, go check out the Product Design Doc
and Product Versioning Design Doc
as a refresher about the model. One of the pieces not covered, is how objects
are associated with each other. Currently, it's through the ObjectLink
model,
which roughly has the following signature:
case class ObjectLink(leftId: Int, rightId: Int, linkType: String)
This model associates the shadows of two specific objects. For example, to make
an association between a Product
and Sku
, a link with leftId
pointing to
the product's shadowId
and rightId
pointing to the SKU's shadowId
would be
created. Since shadows are specific to a context and unique per version, we have
a link that's context-aware and versioned. Woo!
Alas, we have trouble in paradise: while this architecture optimizes how to
handle different versions and contexts excellently, the update scenario is much
rockier. In short, the problem is that every time an object gets updated a new
shadow is created, and then the ObjectLinks
in the entire object graph must
be refreshed.
Consider the following example of a product, its SKUs, and their images.
┌───────────────────────────┐
│ Product (FoxComm T-Shirt) │
└─────────────┬─────────────┘
┌──────────────────┴────────┬─────────────────────────┐
┌────────────┴────────────┐ ┌────────────┴─────────────┐ ┌─────────┴────────┐
│ SKU │ │ SKU │ │ Album │
│ (Black FoxComm T-Shirt) │ │ (Orange FoxComm T-Shirt) │ │ (Desktop Images) │
└────────────┬────────────┘ └────────────┬─────────────┘ └──────────────────┘
│ │
┌────────┴───────┐ ┌────────┴────────┐
│ Album │ │ Album │
│ (Black Images) │ │ (Orange Images) │
└────────────────┘ └─────────────────┘
In this case, we'll have the following ObjectLinks
:
// Links between the Product and each SKU.
ObjectLink(leftId = product.shadow.id, rightId = blackSku.shadow.id, linkType = ProductSku)
ObjectLink(leftId = product.shadow.id, rightId = orangeSku.shadow.id, linkType = ProductSku)
// Link between the Product and it's Album.
ObjectLink(leftId = product.shadow.id, rightId = album.shadow.id, linkType = ProductAlbum)
// Link between each SKU and it's Album.
ObjectLink(leftId = blackSku.shadow.id, rightId = blackAlbum.shadow.id, linkType = SkuAlbum)
ObjectLink(leftId = orangeSku.shadow.id, rightId = orangeAlbum.shadow.id, linkType= SkuAlbum)
Now, let's say that the administrator decides to update the title of the orange SKU to be "Burnt Orange SKU". The following operations will need to happen:
-
orangeSku
will be updated and will have a new shadow ID. -
In order for the link to
orangeAlbum
to be valid, a new link needs to be generated between the neworangeSku
shadow and the oldorangeAlbum
shadow.1ObjectLink(leftId = newOrangeSkuShadowId, rightId = orangeAlbum.shadow.id, linkType = SkuAlbum)
-
Update the link to the
Product
. This will require creating a new shadow forProduct
and a link between each object's new shadow.ObjectLink(leftId = newProductShadowId, rightId = newOrangeSkuShadowId, linkType = ProductSku)
-
Update the link between
Product
's new shadow andblackSku
.
ObjectLink(leftId = newProductShadowId, rightId = blackSku.shadow.id, linkType = ProductSku)
As you can see in this simple, very small, example: a lot of tree traversal is needed for any update operation on an object that's nested deep within a tree. In a real-world scenario, the object graph is going to be a lot larger, as more object associations, such as categories, tags, and variants/variators will be part of the graph.
What we need instead is an algorithm that is more efficient on update operations even if the potential tradeoff is a less efficient rollback scenario.
The solution to our problem is to leverage ObjectHeads
2
as the primary point of connection between objects and overriding the link when
we want to associate specific commits in the past.
Consider this alternate model of an ObjectLink
:
case class ObjectLink(leftHeadId: Int, leftCommitId: Option[Int] = None,
rightHeadId: Int, rightCommitId: Option[Int] = None)
The core concepts of this new model are:
-
leftHeadId
andrightHeadId
point to the heads of the objects. -
leftCommitId
andrightCommitId
will have a value ofNone
by default, meaning that objects will be linked with whatever commit is referenced by the head. - In the less likely case that objects need to persist against specific commits,
leftCommitId
andrightCommitId
can reference those, bypassing the commits associated with the heads. - Each type of association with have its own table, so that we can have more referential integrity in the database.
In the 90% case, when we're updating objects that are the most recent versions, it makes updates far simpler. Consider the example that was shown in the previous section.
Footnotes
1: We might actually be better off updating orangeAlbum
to have a new shadow ID as well, then you can find its place in object graph
with the Album as a root node. For now, though, let's not worry about it.
2: ObjectHead
is analogous to the head object in a
git branch. It stores the context
and points to the ObjectForm
and most
recent ObjectShadow
and ObjectCommit
references for an object in a context.