Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Introduce binlog system table to pack the UB and UA #4520

Merged
merged 2 commits into from
Nov 14, 2024

Conversation

Aitozi
Copy link
Contributor

@Aitozi Aitozi commented Nov 13, 2024

Purpose

Linked issue: close #4505

Tests

API and Format

Documentation

Copy link
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we can create a new system table instead of an option.

@Aitozi
Copy link
Contributor Author

Aitozi commented Nov 13, 2024

I feel like we can create a new system table instead of an option.

Yes, it's better. Updated.

@JingsongLi JingsongLi changed the title [Feature] Allow to pack the UPDATE_BEFORE and UPDATE_AFTER message when streaming reading the audit table [core] Introduce binlog system table to pack the UPDATE_BEFORE and UPDATE_AFTER Nov 14, 2024
@JingsongLi JingsongLi changed the title [core] Introduce binlog system table to pack the UPDATE_BEFORE and UPDATE_AFTER [core] Introduce binlog system table to pack the UB and UA Nov 14, 2024
+------------------+----------------------+-----------------------+
| rowkind | column_0 | column_1 |
+------------------+----------------------+-----------------------+
| +I | [col_0, null] | [col_1, null] |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just one element?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The schema should keep same, so this is an array only with one element.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean one element, not one element and a null, two elements.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get it, updated.

*
* <p>DELETE: [-D, [co1, null], [col2, null]]
*/
public class BinlogTable implements DataTable, ReadonlyTable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reuse code with AuditLogTable? Like extending it to reuse something.

You can streaming query the binlog through binlog table. You can get the update before and update after in one row. Note:

{{< hint info >}}
1. Only support streaming query the binlog
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only support streaming query the binlog.

import java.io.IOException;
import java.util.function.BiFunction;

/** The reader which will pack the update before and updater after message together. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"updater after" to "update after"

}
InternalRow row2 = null;
if (row1.getRowKind() == RowKind.UPDATE_BEFORE) {
row1 = serializer.copy(row1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why copy instead of using row1 directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The underlay's row fetched from the nextRow could be reused.


public static final String BINLOG = "binlog";

public static final PredicateReplaceVisitor PREDICATE_CONVERTER =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private


@Override
public RowType rowType() {
List<DataField> fields = new ArrayList<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

init map with capacity

*
* <p>DELETE: [-D, [co1, null], [col2, null]]
*/
public class BinlogTable implements DataTable, ReadonlyTable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be can extract some common code with the AuditLogTable system table.

@@ -162,6 +162,12 @@ under the License.
<artifactId>iceberg-data</artifactId>
<version>${iceberg.version}</version>
<scope>test</scope>
<exclusions>
<exclusion>
<artifactId>parquet-avro</artifactId>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do this here?
It seems has no relation with this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the first commit message

@@ -132,6 +132,40 @@ public void testSnapshotsTable() throws Exception {
Row.of(3L, 0L, "APPEND"));
}

@Test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think add a class of BinlogTableTest in package "org.apache.paimon.table.system" may better?

@Aitozi
Copy link
Contributor Author

Aitozi commented Nov 14, 2024

Addressed the comments, pls take a look again @JingsongLi @wwj6591812

Copy link
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@JingsongLi JingsongLi merged commit 9e4b28a into apache:master Nov 14, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Allow to pack the UPDATE_BEFORE and UPDATE_AFTER message when streaming reading the audit table
3 participants