Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Introduce data-file.thin-mode in primary key table write #4666

Merged
merged 33 commits into from
Dec 13, 2024

Conversation

leaves12138
Copy link
Contributor

@leaves12138 leaves12138 commented Dec 9, 2024

Purpose

Writing primary key table, now, we generates spare column.
For example:
One table with column a,b,c,d,e,f,g primary-key: a,b

Write actually:    _KEY_a, _KEY_b, _FIELD_SEQUENCE, _ROW_KIND, a, b, c, d, e, f, g

In which, _KEY_a and _KEY_b are totally equal to a, b.
Therefore, we introduce data-file.thin-mode here:

thin mode off:    _KEY_a, _KEY_b, _FIELD_SEQUENCE, _ROW_KIND, a, b, c, d, e, f, g

thin mode on:    _FIELD_SEQUENCE, _ROW_KIND, a, b, c, d, e, f, g

Notice that: If thin-mode is on, the files it write could not be read by former paimon sdk version. Linked issue: #4651

@leaves12138 leaves12138 changed the title [WIP] [core] Introduce thin-mode in primary key table writing. [WIP] [core] Introduce storage.thin-mode in primary key table writing. Dec 10, 2024
@leaves12138 leaves12138 changed the title [WIP] [core] Introduce storage.thin-mode in primary key table writing. [WIP] [core] Introduce storage.thin-mode in primary key table write. Dec 10, 2024
@leaves12138 leaves12138 changed the title [WIP] [core] Introduce storage.thin-mode in primary key table write. [core] Introduce storage.thin-mode in primary key table write. Dec 10, 2024
@JingsongLi JingsongLi changed the title [core] Introduce storage.thin-mode in primary key table write. [core] Introduce data-file.thin-mode in primary key table write Dec 10, 2024
return toRow(record.sequenceNumber(), record.valueKind(), record.value());
}

public InternalRow toRow(long sequenceNumber, RowKind valueKind, InternalRow value) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest add a comment here.
record.key() is not write into row

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already comment before class definition.

Copy link
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 cc @tsreaper to take a look

Comment on lines 57 to 58
|| (options.dataFileThinMode()
&& keyNames.contains(SpecialFields.KEY_FIELD_PREFIX + field))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comments about why do we need this condition.

Comment on lines 195 to 197
Options options = new Options();
options.set(CoreOptions.DATA_FILE_THIN_MODE, false);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default value of CoreOptions.DATA_FILE_THIN_MODE is false, why do we need this change? Even if the default value changes to true, will it break the test?

} else if (SpecialFields.isSystemField(field)) {
} else if (SpecialFields.isSystemField(field)
||
// If we config METADATA_STATS_MODE to true, we need to maintain the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

METADATA_STATS_MODE?

@@ -192,6 +192,8 @@ public void testRewriteSuccess(boolean rewriteChangelog) throws Exception {

private KeyValueFileWriterFactory createWriterFactory(
Path path, RowType keyType, RowType valueType) {
Options options = new Options();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this change now.

@@ -143,6 +143,7 @@ private void recreateMergeTree(long targetFileSize) {
options.set(
CoreOptions.NUM_SORTED_RUNS_STOP_TRIGGER,
options.get(CoreOptions.NUM_SORTED_RUNS_COMPACTION_TRIGGER) + 1);
options.set(CoreOptions.DATA_FILE_THIN_MODE, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this change?

@tsreaper tsreaper merged commit 3a7d868 into apache:master Dec 13, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants