DOC-3834 initial edit of data-pipelines pages

redis · May 24, 2024 · 422aae6 · 422aae6
1 parent 9ba0ec9
commit 422aae6
Show file tree

Hide file tree

Showing 4 changed files with 61 additions and 24 deletions.
diff --git a/.../integrate/redis-data-integration/ingest/data-pipelines/data-denormalization.md b/.../integrate/redis-data-integration/ingest/data-pipelines/data-denormalization.md
@@ -16,18 +16,25 @@ type: integration
 weight: 30
 ---
 
-The data in the source database is often _normalized_, meaning that column values are scalar and entity relationships are expressed as mappings of primary keys to foreign keys between different tables.
-Normalized data models are useful when you're inserting, updating, and deleting data at the cost of slower reads.
-Redis as a cache, on the other hand, is focused on speeding up read queries. To that end, RDI provides _denormalization_ of data.
+The data in the source database is often
+[*normalized*](https://en.wikipedia.org/wiki/Database_normalization).
+This means that columns can't have composite values (such as arrays) and relationships between entities
+are expressed as mappings of primary keys to foreign keys between different tables.
+Normalized data models reduce redundancy and improve data integrity for write queries but this comes
+at the expense of speed.
+A Redis cache, on the other hand, is focused on making *read* queries fast, so RDI provides data
+*denormalization* to help with this.
 
 ## Nest strategy
 
-Nest is the only currently supported denormalization strategy.
-This strategy denormalizes many-to-one relationships in the source database to JSON documents, where the parent entity is the root of the document and the children entities are nested inside a JSON `map` attribute.
+*Nesting* is the strategy RDI uses to denormalize many-to-one relationships in the source database.
+It does this by representing the
+parent object (the "one") as a JSON document with the children (the "many") nested inside an
+attribute called `map`.
 
 {{< image filename="/images/rdi/nest-flow.png" >}}
 
-Denormalization is performed by using a `nest` block in the children entities' RDI job, as shown in this example:
+Configure normalization with a `nest` block in the child entities' RDI job, as shown in this example:
 
 ```yaml
 source:

diff --git a/content/integrate/redis-data-integration/ingest/data-pipelines/data-pipelines.md b/content/integrate/redis-data-integration/ingest/data-pipelines/data-pipelines.md
@@ -1,8 +1,8 @@
 ---
-Title: Pipelines
-linkTitle: Pipelines
-description: Learn how to configure ingest pipelines 
-weight: 4
+Title: Configure data pipelines
+linkTitle: Configure
+description: Learn how to configure ingest pipelines for data transformation
+weight: 1
 alwaysopen: false
 categories: ["redis-di"]
 aliases:
@@ -12,7 +12,7 @@ RDI implements
 [change data capture](https://en.wikipedia.org/wiki/Change_data_capture) (CDC)
 with *pipelines*. (See the
 [architecture overview]({{< relref "/integrate/redis-data-integration/ingest/architecture#overview" >}})
-for an introduction to pipelines.) There are 2 basic types of pipeline:
+for an introduction to pipelines.) There are two basic types of pipeline:
 
 - *Ingest* pipelines capture data from an external source database
   and add it to a Redis target database.
@@ -28,12 +28,13 @@ structure of the configuration:
 {{< image filename="images/rdi/ingest/ingest-config-folders.svg" >}}
 
 The main configuration for the pipeline is in the `config.yaml` file.
-This specifies the connection details for the source database(s) (such
+This specifies the connection details for the source database (such
 as host, username, and password) and also the queries that RDI will use
 to extract the required data. You can also specify one or more optional *job* configurations in the `Jobs` folder. Use these to specify custom
-transformations to apply to the source data before writing it to the target.
+*data transformations*
+to apply to the source data before writing it to the target.
 
-The sections below describe these 2 types of configuration files in more detail.
+The sections below describe these two types of configuration files in more detail.
 
 ## The `config.yaml` file
 
@@ -73,8 +74,8 @@ The main sections of the file configure [`sources`](#sources) and [`targets`](#t
 
 ### Sources
 
-The `sources` section has one or more subsections for each of the sources that
-you need to configure. Every source section starts with a unique name
+The `sources` section has a subsection for the source that
+you need to configure. The source section starts with a unique name
 to identify the source (in the example we have a source
 called `mysql` but you can choose any name you like). The example
 configuration contains the following data:
@@ -110,7 +111,9 @@ and TLS/mTLS secrets here if you use them.
 
 ## Job files
 
-You can optionally supply one or more job files that specify how you want to transform the captured data before writing it to the target. Each job file contains a YAML
+You can optionally supply one or more job files that specify how you want to
+transform the captured data before writing it to the target.
+Each job file contains a YAML
 configuration that controls the transformation for a particular table from the source
 database. For ingest pipelines, you can also add a `default-job.yaml` file to provide
 a default transformation for tables that don't have a specific job file of their own.
@@ -165,13 +168,15 @@ available source, transform, and target configuration options and also a set
 of example job configurations.
 
 ## Source preparation
+
 Before using the pipeline you must first prepare your source database to use
 the Debezium connector for *change data capture (CDC)*. See the
 [architecture overview]({{< relref "/integrate/redis-data-integration/ingest/architecture#overview" >}})
 for more information about CDC.
 Each database type has a different set of preparation steps. You can
 find the preparation guides for the databases that RDI supports in the
-[Prepare your source database](#prepare-your-source-database) section below.
+[Prepare source databases]({{< relref "/integrate/redis-data-integration/ingest/data-pipelines/prepare-dbs" >}})
+section.
 
 ## Ingest pipeline lifecycle
 

diff --git a/...nt/integrate/redis-data-integration/ingest/data-pipelines/data-type-handling.md b/...nt/integrate/redis-data-integration/ingest/data-pipelines/data-type-handling.md
@@ -16,15 +16,14 @@ type: integration
 weight: 20
 ---
 
-## Debezium type handling
-
 RDI automatically converts data that has a Debezium JSON schema into Redis types.
 Some Debezium types require special conversion. For example:
 
 - Date and Time types are converted to epoch time.
-- Decimal numeric types are converted to strings that can be used by applications without losing precision.
+- Decimal numeric types are converted to strings so your app can use them
+  without losing precision.
 
-The following Debezium logical types are currently handled:
+The following Debezium logical types are supported:
 
 - double
 - float
@@ -42,10 +41,10 @@ The following Debezium logical types are currently handled:
 - org.apache.kafka.connect.data.Decimal
 - org.apache.kafka.connect.data.Time
 
-These types are currently **not** supported and will return "Unsupported Error":
+These types are **not** supported and will return "Unsupported Error":
 
 - io.debezium.time.interval
 
-All other values will be treated as plain String.
+All other values are treated as plain strings.
 
 For more information, see [a full list of source database values conversion]({{<relref "/integrate/redis-data-integration/reference/data-types-conversion">}}).
diff --git a/...nt/integrate/redis-data-integration/ingest/data-pipelines/prepare-dbs/_index.md b/...nt/integrate/redis-data-integration/ingest/data-pipelines/prepare-dbs/_index.md
@@ -0,0 +1,26 @@
+---
+Title: Prepare source databases
+aliases: null
+alwaysopen: false
+categories:
+- docs
+- integrate
+- rs
+- rdi
+description: Enable CDC features in your source databases
+group: di
+hideListLinks: false
+linkTitle: Prepare source databases
+summary: Redis Data Integration keeps Redis in sync with the primary database in near
+  real time.
+type: integration
+weight: 30
+---
+
+Each database uses a different mechanism to track changes to its data and
+generally, these mechanisms are not switched on by default.
+RDI's Debezium collector uses these mechanisms for change data capture (CDC),
+so you must prepare your source database before you can use it with RDI.
+
+The pages in this section give detailed instructions to get your source
+database ready for Debezium to use: