-
Notifications
You must be signed in to change notification settings - Fork 3
Search Architecture
We use ElasticSearch as our main source of caching and as a search engine across the FoxCommerce platform. In order to use it correctly, we must create and update indicies, manage mappings per scope, and push information to ES. This document outlines how we approach those problems.
This section covers how we create an manage mappings and indices across ElasticSearch. At the most basic level, we want to create views of data and import them into ElasticSearch in order to be queried in fast, structured ways. Then, we will need to manage the evolution of that search over time by changing the structure of mappings and reindexing.
search_view
One or more Postgres tables that contain the raw data that can be searched. This data is consumed by Green River and pushed to ElasticSearch.
mapping
A mapping
is a fundamental part of ElasticSearch. It's comprised of two parts:
- a definition of every field in the
mapping
and how ElasticSearch will analyze the contents of those fields - the indexed data that we import from the
search_view
We consider mapping definitions to be immutable, though that's not a strict requirement of ElasticSearch, and each search will consist of multiple versioned mappings.
There are a few restrictions that exist for updating ElasticSearch mappings:
- Adding a field to a mapping does not require data to be reindexed
- Changing the type or analyzer used on a field requires a complete reindex of the mapping
- Renaming a field requires a complete reindex of the mapping
- Removing a field requires a complete reindex of the mapping
It is the limitations of items #2-4 that caused us to create the search
.
alias
Think of an alias as a symlink in ElasticSearch. In short, an alias is a named shortcut for a mapping.
search
Search is a FoxCommerce-defined concept that allows us to aggregate data, analyze that data, and query against it. It rolls up a number of ElasticSearch primitives, such as aliases and mappings, to get around some of the inherent limitations of ES and give us zero-downtime deployments. and updates.
scope
Scope is a data primitive that we use for permissioning across the platform. While many of the details of scopes are outside the scope (ha!) of this document, it's of note here because for searches that contain private data, we create an index per scope.
To illustrate this concept, let's look at the following example. In this example, we have three scopes: 1, 2, and 3. 1 is the parent of 2 and 3, and they form the following tree:
1
/ \
2 3
Since all are stored using the ltree format, we write each scope as a '.' delimited directory. Here are the three scopes written in the format that we will use going forward:
- 1 = 1
- 2 = 1.2
- 3 = 1.3
Because of the parent-child nature of the relationship, scope 1
has access to
all data under 1.2
and 1.3
, while 1.2
and 1.3
only have access to their
own data.
As mentioned earlier, a search
is a structure that allows us to create and
manage the lifecycle of a mapping. Here's a brief overview of the architcture of
the search and how it fits into the overall context of the system.
┌─────────────────────┐
│ API Gateway [nginx] │
└──────────┬──────────┘
|
┌──────────┴─────────┐
┌────────┴────────┐ ┌───────┴───────┐
│ Application API │ │ ElasticSearch │
└────────┬────────┘ │ [index] │
│ └───────┬───────┘
│ │ ┌──────────┐ ┌──────────────────────────┐
│ ├──→│ products ├─→│ products_search_view__v2 │
│ │ ┌→│ [alias] │ │ [mapping] │
│ │ │ └──────────┘ └──────────────────────────┘
│ │ │ ┌──────────────────────────┐
│ │ │ │ products_search_view__v1 │
│ │ │ │ [mapping] │
│ │ │ └──────────────────────────┘
│ │ │
│ │ │
│ │ │ ┌──────────┐ ┌──────────────────────────┐
│ └─+→│ orders ├─→│ orders_search_view__v2 │
│ ├→│ [alias] │ │ [mapping] │
│ │ └──────────┘ └──────────────────────────┘
│ │ ┌──────────────────────────┐
│ │ │ orders_search_view__v1 │
│ │ │ [mapping] │
│ │ └──────────────────────────┘
╭──┴─╮ │
│ │ ┌───────┐ ┌────┴────────┐
│ DB ├──→│ Kafka │──→│ Green River │
│ │ └───────┘ └─────────────┘
╰────╯
As the simplified document above shows, an ElasticSearch index is populated by
numerous search
instances (represented as an alias). There may be multiple
mappings for each alias, but that alias will only pull results from the most
recent version of its mapping.
This means that when a client makes a request to /api/v1/search/products
,
there are a number of things happening in the background.
- Nginx is routing the request to the appropriate index (see Scopes for more details)
- Once inside the index, the request is routed to an alias
- The alias retrieves results from the specific mapping to which it's linked
A similar process as what's described above is used to insert, update, and delete data from ElasticSearch.
- Postgres instances contain tables that match the schemas of the ElasticSearch mappings
- These tables are updated via Postgres triggers as changes occur in the system
- Those changes are streamed through Kafka and picked up by Green River
- Based on
scope
and an internal mapping, Green River decides what alias should be updated -
Green River makes a
PUT
against the appropriate alias - The mapping that is linked to the alias gets updated
As noted above, scopes are the primitives that we use to manage roles and permissions throughout the platform. We manage permissions in ElasticSearch by creating an index per scope, then letting Green River insert, update, and delete rows in the correct index or indexes.
All searches that can be scoped have a scope
field in the Postgres table that
backs up the search. When Green River processes an update on those takes, it
analyzes the scope field in a search view, it intelligently appends the row to
correct indices.
Example
Consider that we have the scopes listed above (1
, 1.2
, and 1.3
) with the
search products
. Let's go through what happens when a new product is created
in scope 1.2
.
- Since we have three scopes, we have three admin indices:
admin_1
,admin_1.2
, andadmin_1.3
. -
Green River picks up an event in Kafka when the product is created that has
a scope of
1.2
. - Based on the format of
1.2
, Green River creates a record inadmin_1/products
andadmin_1.2/products
.
Note that one of side effects of this implementation is that we duplicate data
in both the admin_1
and admin_1.2
indices. This allows us to have a really
simple permissions model where user can only access the index that identically
matches their scope.