Skip to content

Commit

Permalink
updated the architecture doc
Browse files Browse the repository at this point in the history
  • Loading branch information
yaronp68 committed Aug 23, 2023
1 parent 25192ea commit 405c288
Showing 1 changed file with 76 additions and 21 deletions.
97 changes: 76 additions & 21 deletions content/rdi/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,59 +9,89 @@ headerRange: "[2]"
aliases:
---

Redis Data Integration is a product for data ingestion from existing systems of record (mainly databases) into Redis Enterprise.
You can think of Redis Data Integration as a streaming ELT process:
Redis Data Integration (RDI) is a product that helps Redis Enterprise users ingest and export data in near real time

- Data is **E**xtracted from the source database using [Debezium](https://debezium.io/) connectors.
- End to end solution - No need for additional tools and integrations
- Capture Data Change (CDC) included
- Covers most popular databases as sources & targets
- Declarative data mapping and transformations - no custom code needed
- Data delivery guaranteed (at least once)
- Zero downtime day 2 operations (Reconfigure & Upgrade)

Redis Data Integration currently supports 2 use cases:

- Ingest (Cache prefetch).
- Write Behind.

## Ingest functionality and architecture

You can think of RDI Ingest as a streaming ELT process:

- Data is **E**xtracted from the source database using [Debezium Server](https://debezium.io/) connectors.
- Data is then **L**oaded into Redis Data Integration, a Redis database instance that keeps the data in [Redis streams](https://redis.io/docs/manual/data-types/streams/) alongside required metadata.
- Data is then **T**ransformed using provided [RedisGears recipes](https://developer.redis.com/howtos/redisgears/) and written to the target Redis DB.

Redis Data Integration using Debezium works in 2 modes:
Redis Data Integration using Debezium Server works in 2 modes:

1. Initial sync - Where a snapshot of the entire database or a subset of selected tables is used as a baseline. The entire dataset is streamed to Redis Data Integration and then transformed and written into the target Redis database.
2. Live updates - Where Debezium captures changes to the data that happen after the baseline snapshot and streams them to Redis Data Integration where they are transformed and written to the target.

![High-level Redis Data Integration architecture](/images/rdi/redis-di-simplified.png)

## Supported data transformations
### Supported data transformations

#### Mode mapping

RDI supports conversion of a database record into the following Redis types:

- Hash
- Set
- Sorted Set
- Stream
- Json
- String

Each column in the record is automatically formatted as a number or string. See the [data types list]({{<relref "/rdi/reference/data-types-conversion">}}) for the exact mappings.

#### Additional transformations

Currently Redis Data Integration supports transforming data from RDBMS row into a [Redis Hash](https://redis.io/docs/manual/data-types/#hashes).
In the short term we will add support for RDBMS row into [RedisJSON](https://redis.io/docs/stack/json/) document.
See the [data types list]({{<relref "/rdi/reference/data-types-conversion">}}) for the exact mappings.
RDI supports declarative transformations to further manipulate the data, including denormalization of data from few records into a single Json key in Redis.

## Architecture and components
Red the data transformation section of these docs to learn more.

### Architecture and components

![Detailed Redis Data Integration architecture](/images/rdi/redis-di.png)

## Feeders
#### Feeders

Currently the only source for Redis Data Integration is the [Debezium Redis Sink Connector](https://debezium.io/documentation/reference/stable/operations/debezium-server.html#_redis_stream).
Other sources and SDK will be added in later versions.
Debezium Server has a Redis sink that writes data & state to RDI streams.
In addition RDI works with [Arcion](arcion.io) commercial CDC that provides additional set of sources.

## Redis data integration data plane
#### Redis data integration data plane

### RDI data streams
##### RDI data streams

Redis Data Integration receives data using [Redis streams](https://redis.io/docs/manual/data-types/streams/). Records with data from a specific database table are written to a stream with a key reflecting the table name. This allows a simple interface into Redis Data Integration and keeping the order of changes as served by Debezium.

### RDI engine
##### RDI engine

RDI Engine is built on top of RedisGears. It has 2 main logical functions:

- Transformation of data - In the current version, we only support Debezium types into Redis types but future versions will support [Apache Avro](https://avro.apache.org/docs/current/). In addition we plan to provide data mapping between denormalized relational model and denormalized Redis JSON documents.
- Writing data - This is the part where Redis Data Integration writes the data to the target ensuring order & guarantying at least once to prevent data loss. In addition, the Writer can be configured to ignore, abort or keep rejected data entries.

## Redis data integration control plane
#### Redis data integration control plane

### RDI CLI
#### RDI CLI

Redis Data Integration comes with a python based CLI that is intuitive, self explanatory & validating to ease the entire lifecycle of Redis Data Integration setup and management.

### RDI configuration
#### RDI configuration

Redis Data Integration configuration is persisted at the cluster level. The configuration is written by the CLI `deploy` command, reflecting the configuration file. This mechanism allows for automatic configuration of new shards when needed and can survive shard and node failure.

### Secrets handling
#### Secrets handling

Redis Data Integration requires the following secrets:

Expand All @@ -76,7 +106,7 @@ Credentials and Certificate can be served in one of the following ways:
- Stored in environment variables.
- Injected to the temp file system by a secret store agent (for example, [HashiCorp Vault](https://www.vaultproject.io/)).

## Scalability and high availability
### Scalability and high availability

Redis Data Integration is highly available:

Expand All @@ -87,7 +117,7 @@ Redis Data Integration is horizontally scalable:

- Table level data streams can be split to multiple shards where they can be transformed and written to the target in parallel, while keeping table level order of updates.

## Tested topologies
### Tested topologies

Redis Data Integration is running on Redis Enterprise. It works on any installation of Redis Enterprise regardless of the runtime (bare metal, VMs, or containers).
Redis Data Integration can be collocated with the target Redis DB or run on a different Redis Enterprise cluster.
Expand All @@ -97,3 +127,28 @@ Debezium Server should run in warm High Availability topology in one of the foll
- Running on 2 VMs in an active/passive setup. The failover orchestration can be done by tools such as [Pacemaker](https://clusterlabs.org/pacemaker/doc/).
- Running as a Kubernetes pod where Kubernetes watches the pod liveliness and readiness and acts to recover nonfunctional pod
It is important to note that Redis Data Integration does not keep any state on Debezium. All state is kept in Redis Data Integration.

## Write behind functionality and architecture

RDI write behind allows integration between Redis Enterprise (as the source of changes to data) to downstream databases or datastores.
RDI captures the changes to a selected set of key-patterns in Redis keyspace and asynchronsly writes them in small batches to the downstream database, so the application doesn't need to take care of remodeling the data or managing the connection with the downstream database.

RDI write behind can normalize a key in Redis to several records in one or more tables at the target.
To learn more about write behind declarative jobs and renormalization read the [write behind quick start guide]({{<relref="/rdi/quickstart/write-behind-guide.md">}})

### Write behind topology

RDI CLI and engine comes in one edition that can run both ingest and write behind. However, the topology for write behind is different.
RDI engine is installed on Redis database containing the application data and not on a separate staging Redis database. RDI streams and control plain adds a small footprint of few hundreds of MBs to the Redis database. In the write behind topology, RDI process in parallel on each shard and establishes a single connection from each shard to the downstream database.

### model translation

RDI write behind can track changes to the following Redis types:

- Hash
- Json
- Set
- Sorted Set

Unlike the ingest scenario, write behind has no default behavior for model translation. It is up to the user to create a declarative job, specifying the mapping between Redis key and target database record.
The job `keys` and `mapping` sections help making this an easy task.

0 comments on commit 405c288

Please sign in to comment.