-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
15 changed files
with
103 additions
and
32 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
created: 20241007132831726 | ||
creator: Gezi-lzq | ||
modified: 20241007150326466 | ||
modifier: Gezi-lzq | ||
tags: 梳理MM2位点翻译流程 | ||
title: MM2的位点翻译方案 | ||
|
||
根据[[预期的位点翻译的效果]]中的前提与效果,当源集群和目标集群之间的消息复制是一一对应的时候,可以通过距离差值将原集群位点映射到目标集群,这样可以保证是绝对准确的。 | ||
|
||
但是根据[[位点翻译存在哪些挑战]]可知在复制过程中无法保证这个前提,那么如果还按照这个方式去翻译,会是什么效果呢? | ||
|
||
答曰:因为翻译后的offset可能在真实对应的offset之前,也能在真实对应的offset之后,所以肯呢个会丢失消息或者重复消费。 | ||
|
||
那么如何避免这种现象? | ||
|
||
!! 暴力的方案(Dense Cache) | ||
|
||
先从暴力的方案去思考,如果存在一个map,记录了源集群到目标集群每个位点的映射,那么翻译的时候只需要从map中取的。 | ||
[img[offset_syncs_dense_cache.png]] | ||
|
||
!!! 实际可行性分析 | ||
|
||
假设迁移了10000条消息, 一个 Map(比如 Java 中的 HashMap)来存储 10000 条消息的位点(offset)的映射关系。每个 offset 的大小是 64 位(即 8 字节)。 | ||
总大小 = 10000 × 8 × 2 = 160000 字节 = 156.25 KB | ||
|
||
但在实际迁移过程中,传输的消息数量远远不止10000条。随着消息数量的增加,Map的大小会迅速膨胀,对内存管理提出巨大挑战。当消息数量足够多时,可能没有设备能够支撑如此大的内存占用。 | ||
|
||
[img[dense_cache_of_all_offset_syncs.png]] | ||
|
||
!! 进行优化(Re-read offset-syncs from topic) | ||
根据刚刚的方案,我们发现把map放到内存里面行不通,那么干脆就每次转换的时候都去从offset-syncs中读取好了,通过持续从Kafka主题中读取偏移量同步信息来进行偏移量的翻译和同步。 | ||
|
||
[img[Re-read_offset-syncs_from_topic.png]] | ||
|
||
优点: | ||
|
||
1. 不需要将所有的偏移量映射关系存储在内存中,只需要在需要时从主题中读取。这种方式有效地避免了内存使用随着数据量线性增长的问题。 | ||
|
||
2. 当需要进行偏移量翻译时,系统可以根据最新的映射关系进行准确的翻译。 | ||
|
||
缺点: | ||
|
||
1. 持续从头到尾读取偏移量同步信息可能会引入一定的延迟,特别是在偏移量同步主题数据量较大的情况下。 | ||
|
||
2. 尽管减少了内存使用,但偏移量同步主题仍然需要足够的存储空间来保存所有的偏移量同步信息。 | ||
|
||
由于持续从头到尾读取所有数据会导致显著的延迟,以及在读取过程中占用大量的网络带宽,该方案仍然不具有可行性。 | ||
因此需要转变思路,不在倾向于准确的翻译,而是可以接受一些不准确(重复消费)而换来性能上的提升。 | ||
|
||
!! 继续优化(Dense cache of most recent Offset-Syncs) | ||
最近偏移量同步的密集缓存,这种优化方案的核心思想是仅缓存最近的偏移量同步信息,而不是所有的偏移量同步信息。这样可以在保证一定准确性的前提下,显著减少内存使用。 | ||
|
||
[img[dense_cache_of_most_recent_offset-syncs.png]] | ||
|
||
缓存最近的偏移量同步信息:只缓存最近一段时间内的偏移量同步信息,而不是从头到尾的所有信息。例如,只缓存最近 1000 条消息的偏移量同步信息。但是这种情况下,对于比较早的位点的消息,则不能直接转换,另外我们也很难给出应该保留多近的偏移量同步信息。 | ||
|
||
高精度翻译: | ||
|
||
# 对于消费者滞后(lag)中等和低的消费组,提供高精度的偏移量翻译。 | ||
# 对于消费者滞后较高的消费组,无法提供准确的翻译。 | ||
|
||
固定缓存大小: | ||
|
||
# 需要选择一个在所有情况下都适用的固定缓存大小。 | ||
# 缓存大小的选择需要权衡内存使用和翻译精度。 | ||
|
||
内存使用效率: | ||
|
||
# 当偏移量同步信息过于密集时,可能会导致内存使用效率低下。 | ||
# 需要确保缓存中的偏移量同步信息足够分散,以提高内存使用效率。 | ||
|
||
!! 当前方案:Sparse Cache | ||
|
||
|
||
|
||
|
This file was deleted.
Oops, something went wrong.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
title: Re-read_offset-syncs_from_topic.png | ||
type: image/png |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
title: dense_cache_of_all_offset_syncs.png | ||
type: image/png |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
title: dense_cache_of_most_recent_offset-syncs.png | ||
type: image/png |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
title: offset_syncs_dense_cache.png | ||
type: image/png |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,20 +1,20 @@ | ||
created: 20241007074530319 | ||
creator: Gezi-lzq | ||
due: 20241008155959999 | ||
modified: 20241007132848698 | ||
modified: 20241007150306451 | ||
modifier: Gezi-lzq | ||
priority: | ||
tags: todo [[增强 Kafka 集群迁移功能特性并完善迁移方案]] kafka kafka迁移 | ||
title: 梳理MM2位点翻译流程 | ||
|
||
主要目标:梳理MM2的位点翻译流程,理解其优势与劣势 | ||
|
||
1. [[为什么需要位点翻译?]] | ||
1. [[为什么需要位点翻译]] | ||
|
||
2. [[预期的位点翻译的效果?]] | ||
2. [[预期的位点翻译的效果]] | ||
|
||
3. [[位点翻译存在哪些挑战?]] | ||
3. [[位点翻译存在哪些挑战]] | ||
|
||
4. [[当前MM2的位点翻译方案?]] | ||
4. [[MM2的位点翻译方案]] | ||
|
||
5. 该位点翻译方案的效果以及优劣? |