Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[common][flink] Add support for complex types in kafka debezium avro cdc action #4246

Conversation

AshishKhatkar
Copy link
Contributor

Purpose

@umeshdangat has an outstanding PR which was updated after PR by @zhuangchong was merged. That PR: #3323

The above patch allows consuming data from avro data from kafka into paimon but it doesnt support complex avro types. This PR achieves that. The original PR zhuangchong#1 was pointing to @zhuangchong branch to clearly show the changes only relavant to supporting complex avro types.

This PR fixes the failing E2E tests for PR: #3931

Copying the note from PR: #3931

One issue is CdcSourceRecord contains Map<String, String> thus the current somewhat tedious approach is to deserialize avro complex types into json strings and then read them back from json strings rather than changing CdcSourceRecord Map<String, Object> to support value as Object. It would be a much larger change looking at the code changes needed.

I had to update the DataField to add a method for dataFieldEqualsIgnoreId which already existing in RichEventParser. For nested RowType fields this becomes necessary (coming from nested avro records) as when a DataField.type= RowType we cannot simply do equals on all data fields as they contain Id as well and it fails the equality, although there is no schema change.

@JingsongLi @zhuangchong let us know what you think.

cc: @umeshdangat

Linked issue: close xxx

Tests

API and Format

Documentation

@JingsongLi
Copy link
Contributor

CC @yuzelin to take a reivew~

@JingsongLi
Copy link
Contributor

Thanks @AshishKhatkar

Comment on lines 332 to 337
if (precision < 9) {
nanos =
(int)
(Math.floor(nanos / Math.pow(10, 9 - precision))
* Math.pow(10, 9 - precision));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is unnecessary. We can store the data and let the readers truncate the data based on precision.
So we can just use Timestamp#fromEpochMillis(long milliseconds, int nanosOfMillisecond)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read the documentation for Timestamp#fromEpochMillis(long milliseconds, int nanosOfMillisecond) and it expects nanosOfMs to be in range [0, 999_999]. Are you saying we need to default to precision 6 here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, if we want to store 1970-01-01 00:00:00.123456789, the first argument milliseconds is 123 and the second argument nanosOfMillisecond is 456789.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuzelin done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks~

Copy link
Contributor

@yuzelin yuzelin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks,@AshishKhatkar! I left a comment, please take a look~

Copy link
Contributor

@yuzelin yuzelin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@yuzelin yuzelin merged commit b2641ad into apache:master Oct 9, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants