Skip to content

Commit

Permalink
update: shorten and link content
Browse files Browse the repository at this point in the history
  • Loading branch information
ArthurFlag committed Aug 1, 2024
1 parent 28eff99 commit 3d32292
Show file tree
Hide file tree
Showing 3 changed files with 60 additions and 46 deletions.
1 change: 1 addition & 0 deletions .github/vale/styles/config/vocabularies/Aiven/accept.txt
Original file line number Diff line number Diff line change
Expand Up @@ -306,6 +306,7 @@ UDFs
unaggregated
unassign
uncomment
uncompacted
unencrypted
unfollow
United States
Expand Down
40 changes: 23 additions & 17 deletions docs/products/kafka/concepts/log-compaction.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,15 @@
title: Compacted topics
---

One way to reduce the disk space requirements in Apache Kafka® is to use
**compacted topics**. This methodology retains only the newest record
One way to reduce the disk space requirements in Apache Kafka® is to use **compacted topics**.
This method retains only the newest record
for each key on a topic, regardless of whether the retention period of
the message has expired or not. Depending on the application, this can
significantly reduce the amount of storage required for the topic.

To make use of log compaction, all messages sent to the topic must have
an explicit key. To enable log compaction, follow the steps described in
[how to configure log cleaner](/docs/products/kafka/howto/configure-log-cleaner).
To use log compaction, all messages sent to the topic must have
an explicit key. To enable log compaction, see
[Configure log cleaner for topic compaction][url].

## How compacted topics work

Expand All @@ -23,7 +23,7 @@ For example, if there is a topic containing a user's home address, on
every update, a message is sent using `user_id` as the primary key and
home address as the value:

```
```text
1001 -> "4 Privet Dr"
1002 -> "221B Baker Street"
1003 -> "Milkman Road"
Expand Down Expand Up @@ -51,6 +51,14 @@ available in the partition. This retention policy can be set per-topic,
so a single cluster can have some topics where retention is enforced by
size or time and other topics where retention is enforced by compaction.

:::warning
The compaction occurs **per partition**: if two records with the same
key land in different partitions, they will not be compacted.

This usually doesn't happen since the record key is used to select the
partition. However, for custom message routing this might be an issue.
:::

## Compacted topic example

To understand better how compaction works, we will look at a partition
Expand Down Expand Up @@ -82,14 +90,14 @@ end result is the following:

## Compacted topic details

A compacted topic consists of an head and a tail:
A compacted topic consists of a **head** and a **tail**:

- the **head** is a traditional Apache Kafka topic where new records
are appended. Therefore, the head can contain duplicated keys.
- the **tail** contains one record per key. Apache Kafka compaction
- The **head** is a traditional Apache Kafka topic where new records
are appended. The head can contain duplicated keys.
- The **tail** contains one record per key. Apache Kafka compaction
ensures that keys are unique in the tail.

Expanding the example above let's assume that the **tail** contains the
Expanding the example above, let's assume that the **tail** contains the
following entries:

| Offset | Key | Value |
Expand Down Expand Up @@ -133,10 +141,8 @@ Lastly, the records in the offset map are added in the tail.
| 4 | 1002 | 21 Jump St |
| 6 | 1001 | Paper Road 21 |

:::warning
The compaction occurs **per partition**: if two records with the same
key land in different partitions, they will not be compacted.
## Related pages

This usually doesn't happen since the record key is used to select the
partition. However, for custom message routing this might be an issue.
:::
- [Configure log cleaner for topic compaction][url]

[url]:/docs/products/kafka/howto/configure-log-cleaner
65 changes: 36 additions & 29 deletions docs/products/kafka/howto/configure-log-cleaner.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
---
title: Configure log cleaner for topic compaction
title: Configure the log cleaner for topic compaction
---

The log cleaner serves the purpose of preserving only the latest value associated with a specific message key in a partition for [compacted topics](/docs/products/kafka/concepts/log-compaction).
The log cleaner serves the purpose of preserving only the latest value associated with a specific message key in a partition for [compacted topics][logcompaction].
In Aiven for Apache Kafka®, the log cleaner is enabled by
default, while log compaction remains disabled.

Expand Down Expand Up @@ -35,51 +35,58 @@ configuration override in place.
value `compact`.
1. Select **Update**.

## Log cleaning frequency and delay
## Configure log cleaning frequency and delay

Before the cleaning begins, the cleaner thread will inspect the logs to
find those with highest **dirty ratio** calculated as the number of
bytes in the head vs the total number of bytes in the log (tail + head);
you can read more about head and tail definition in the
[compacted topic documentation](/docs/products/kafka/concepts/log-compaction).
bytes in the head vs the total number of bytes in the log (tail + head).
Read more about head and tail definition in the
[compacted topic documentation][logcompaction].

The ratio provides an estimation of how many duplicated
keys are present in a topic, and therefore need to be compacted.
keys are present in a topic, and therefore needs to be compacted.

:::tip
For the log cleaner to start compacting a topic, the dirty ratio needs
to be bigger than a threshold set to 50% by default. You can change this
value either globally for the cluster by modifying the property
`kafka.log_cleaner_min_cleanable_ratio` in the **Advanced configuration**
section of the service overview or for a specific topic by modifying
`min_cleanable_ratio` value.
to be bigger than a threshold set to **50% by default**.

You can change this value:
- Globally for the cluster: In the **Advanced configuration**
section of the service overview, modify the value of the
`kafka.log_cleaner_min_cleanable_ratio` property.
- For a specific topic: Modify the value of `min_cleanable_ratio` property.
:::

The log cleaner can be configured to leave some amount of not compacted
"head" of the log by setting compaction time lag. You can achieve this
by setting two additional properties from the **Advanced configuration**
or a corresponding value for an individual topic:
The log cleaner can be configured to leave some amount of uncompacted data in the
head of the log by setting **compaction time lag**. To do so,

- `log.cleaner.min.compaction.lag.ms`: setting to a value greater
than 0 will prevent log cleaner from compacting messages with an age
newer than a minimum message age, this allows you to delay compacting
records.
- `log.cleaner.max.compaction.lag.ms`: the maximum amount of time a
message will remain not compacted.
1. Open In the **Advanced configuration** of your service or an individual topic:
1. Set the following properties:
- `log.cleaner.min.compaction.lag.ms`: Setting to a value greater
than 0 will prevent the log cleaner from compacting messages with an age
newer than a minimum message age. This delays compacting records.
- `log.cleaner.max.compaction.lag.ms`: The maximum amount of time a
message will remain uncompacted.

:::tip
That exact compaction lag can be bigger than the
The compaction lag can be bigger than the
`log.cleaner.max.compaction.lag.ms` setting since it directly depends on
the time to complete the actual compaction process and can be delayed by
the log cleaner threads availability.
:::

## Tombstone records

During the cleanup process, log cleaner threads also removes records
that have a null value, also known as **tombstone** records. These
records can be delayed from being deleted by configuring
`delete.retention.ms` for the compacted topic.
During the cleanup process, the log cleaner also remove records
that have a null value, also known as **tombstone** records. To delay tombstone records
from being deleted, set the `delete.retention.ms` property for the compacted topic.

Consumers can read all tombstone messages as long as they reach the head
of the topic before the period defined in `delete.retention.ms`
(default: 24 hours) is passed.
of the topic before the period defined in `delete.retention.ms` is passed.

## Related pages

- [Compacted topics][logcompaction]
- [Kafka advanced parameters](/docs/products/kafka/reference/advanced-params)

[logcompaction]: /docs/products/kafka/concepts/log-compaction

0 comments on commit 3d32292

Please sign in to comment.