Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[connector] Fluss sink supports ignore_delete. #272

Merged
merged 2 commits into from
Jan 10, 2025

Conversation

loserwang1024
Copy link
Collaborator

Purpose

Linked issue: close #251

Copy link
Collaborator

@luoyuxia luoyuxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! May @wuchong check the table option 'sink.delete-strategy'='ignore_delete' used for ignore delete is fine to you?

Copy link
Member

@wuchong wuchong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @luoyuxia and @loserwang1024 , I left some comments about the config.

flussConfig,
tableRowType,
targetColumnIndexes,
deleteStrategy)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a test for primary key table as well.

.withDescription(
"This field is used to decide what to do when data of type -D/-U is received. "
+ "`IGNORE_DELETE` means ignoring the `-D` and `-U` type message. "
+ "`CHANGELOG_STANDARD` means neither `-U` nor `-D` is ignored, they both cause the corresponding row in fluss to be deleted");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I prefer sink.ignore-delete=true/false. If there are only 2 values and CHANGELOG_STANDARD is the default, I think sink.ignore-delete=true/false is much simpler for users to understand and use.

It seems sink.delete-strategy comes from here that contains additional values NON_PK_FIELD_TO_NULL and DELETE_ROW_ON_PK. However, Fluss already handles partial-updates and partial-delete with leveraging the INSERT INTO (columns) grammar and I don't see this config will extend additional values in the near future.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems sink.delete-strategy comes from here that contains additional values NON_PK_FIELD_TO_NULL and DELETE_ROW_ON_PK.

Yes, I don't know whether user want to partial update(-D) not delete the whole row but also not ignore , just set the non pk columns as null.

@loserwang1024
Copy link
Collaborator Author

rebase into main branch

@wuchong
Copy link
Member

wuchong commented Jan 8, 2025

@loserwang1024 rebase but not squash commits in the future, otherwise, reviewer is hard to track what is changed since last time and have to review all the code again.

Copy link
Member

@wuchong wuchong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.


/** The strategy when fluss sink receives delete data. */
@PublicEvolving
public enum DeleteStrategy implements Serializable {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be removed.

@luoyuxia
Copy link
Collaborator

Let me merge it to reduce the gap between my test branch and main bach...

@luoyuxia luoyuxia merged commit c25c3b0 into alibaba:main Jan 10, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature][connector] Support ignore delete for Log Table
3 participants