-
Notifications
You must be signed in to change notification settings - Fork 998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Query result duplicate primary key #3841
Comments
It may happen when you change the bucket number but have not overwrite table first. |
Bucket number have not been modified since the table was created. |
What about your table schema and did you delete data before? deduplicate.ignore-delete is true @herefree |
{ ...... { |
I set deduplicate.ignore-delete = false, I find it didn‘t have duplicate primary key,but I not sure that the later versions fix this problem. |
@herefree Could you give the detailed minimal reproduce steps so that we can reporduce this bug? |
Search before asking
Paimon version
0.7.0-incubating
Compute Engine
Flink 1.18.0
Minimal reproduce step
we have a flink job write data to paimon table,table options is:
+---------------------------+---------+
| key | value |
+---------------------------+---------+
| bucket | 8 |
| scan.remove-normalize | true |
| deduplicate.ignore-delete | true |
| changelog-producer | none |
| file.format | parquet |
+---------------------------+---------+
What doesn't meet your expectations?
When job execute for some time, we use batch mode query table ,some of our query results duplicate primary key.
When we update paimon version to query this table,it also has duplicate primary key.
Anything else?
I want to know what cause this problem. Is this problem caused by writer operator? Does a later version fix this issue?
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: