[Bug] Query result duplicate primary key #3841

herefree · 2024-07-30T03:34:36Z

Search before asking

I searched in the issues and found nothing similar.

Paimon version

0.7.0-incubating

Compute Engine

Flink 1.18.0

Minimal reproduce step

What doesn't meet your expectations?

When job execute for some time, we use batch mode query table ,some of our query results duplicate primary key.

When we update paimon version to query this table,it also has duplicate primary key.

Anything else?

I want to know what cause this problem. Is this problem caused by writer operator? Does a later version fix this issue?

Are you willing to submit a PR?

I'm willing to submit a PR!

eric666666 · 2024-07-30T06:39:35Z

It may happen when you change the bucket number but have not overwrite table first.

herefree · 2024-07-30T06:59:05Z

It may happen when you change the bucket number but have not overwrite table first.

Bucket number have not been modified since the table was created.

xuzifu666 · 2024-07-30T07:18:34Z

What about your table schema and did you delete data before? deduplicate.ignore-delete is true @herefree

herefree · 2024-07-30T08:10:42Z

What about your table schema and did you delete data before? deduplicate.ignore-delete is true @herefree

{
"id" : 2,
"fields" : [ {
"id" : 0,
"name" : "",
"type" : "STRING NOT NULL",
"description" : ""
}, {
"id" : 1,
"name" : "",
"type" : "STRING",
"description" : ""
},

......

{
"id" : 54,
"name" : "",
"type" : "STRING",
"description" : ""
}, {
"id" : 55,
"name" : "",
"type" : "STRING",
"description" : ""
}, {
"id" : 56,
"name" : "",
"type" : "STRING",
"description" : ""
}, {
"id" : 57,
"name" : "",
"type" : "STRING",
"description" : ""
}, {
"id" : 58,
"name" : "",
"type" : "STRING",
"description" : ""
}, {
"id" : 59,
"name" : "",
"type" : "STRING",
"description" : ""
}, {
"id" : 60,
"name" : "",
"type" : "STRING",
"description" : ""
}, {
"id" : 61,
"name" : "",
"type" : "STRING",
"description" : ""
}, {
"id" : 62,
"name" : "",
"type" : "STRING",
"description" : ""
}, {
"id" : 63,
"name" : "",
"type" : "STRING",
"description" : ""
} ],
"highestFieldId" : 63,
"partitionKeys" : [ ],
"primaryKeys" : [ "id" ],
"options" : {
"bucket" : "8",
"scan.remove-normalize" : "true",
"deduplicate.ignore-delete" : "true",
"changelog-producer" : "none",
"file.format" : "parquet"
},
"comment" : "",
"timeMillis" : 1722325416673
}
I didn‘t delete data before，but the changlog of the upstream table may have -D data.I set deduplicate.ignore-delete is true just don't want -D data was write in this table，Or some flink job don‘t consumer -D data when consuming this table.

herefree · 2024-07-30T08:25:00Z

I also find this duplicate data in the same bucket.

herefree · 2024-08-01T03:27:49Z

What about your table schema and did you delete data before? deduplicate.ignore-delete is true @herefree

I set deduplicate.ignore-delete = false, I find it didn‘t have duplicate primary key,but I not sure that the later versions fix this problem.

discivigour · 2024-08-05T08:37:31Z

@herefree Could you give the detailed minimal reproduce steps so that we can reporduce this bug?

herefree added the bug Something isn't working label Jul 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Query result duplicate primary key #3841

[Bug] Query result duplicate primary key #3841

herefree commented Jul 30, 2024 •

edited

Loading

eric666666 commented Jul 30, 2024 •

edited

Loading

herefree commented Jul 30, 2024

xuzifu666 commented Jul 30, 2024 •

edited

Loading

herefree commented Jul 30, 2024

herefree commented Jul 30, 2024

herefree commented Aug 1, 2024

discivigour commented Aug 5, 2024

[Bug] Query result duplicate primary key #3841

[Bug] Query result duplicate primary key #3841

Comments

herefree commented Jul 30, 2024 • edited Loading

Search before asking

Paimon version

Compute Engine

Minimal reproduce step

What doesn't meet your expectations?

Anything else?

Are you willing to submit a PR?

eric666666 commented Jul 30, 2024 • edited Loading

herefree commented Jul 30, 2024

xuzifu666 commented Jul 30, 2024 • edited Loading

herefree commented Jul 30, 2024

herefree commented Jul 30, 2024

herefree commented Aug 1, 2024

discivigour commented Aug 5, 2024

herefree commented Jul 30, 2024 •

edited

Loading

eric666666 commented Jul 30, 2024 •

edited

Loading

xuzifu666 commented Jul 30, 2024 •

edited

Loading