Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Compact/Overwrite action may lose new added data #4802

Open
2 tasks done
Xiangakun opened this issue Dec 30, 2024 · 1 comment
Open
2 tasks done

[Bug] Compact/Overwrite action may lose new added data #4802

Xiangakun opened this issue Dec 30, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@Xiangakun
Copy link
Contributor

Search before asking

  • I searched in the issues and found nothing similar.

Paimon version

0.9

Compute Engine

spark-3.2

Minimal reproduce step

Spark SQL:

  1. create bucket unaware table without primary keys like below:
    create table paimon.paimon_test.test_compact(id int, data string) TBLPROPERTIES('bucket' = '-1');
  2. Add few records into table paimon.paimon_test.test_compact;
    insert into paimon.paimon_test.test_compact values(1, 'data01'), (2, 'data02'),(3, 'data03'),(4, 'data04');
  3. call compact procedure:
    CALL paimon.sys.compact(table => 'paimon_test.test_compact', order_strategy => 'order', order_by => 'id');
    At the same time, add a new record with another spark-sql cli:
    insert into paimon.paimon_test.test_compact values(666, 'data666')
  4. Insert action is successful and data with values(666, 'data666') is missing after compact success.

What doesn't meet your expectations?

The insert data with values(666, 'data666') should not be lost, It's better to throw a compact error instead of making data lose

Anything else?

I checked the code, the compact action just implemented same as the overwrite action, and system always uses the latest snapshot to mark all the files as deleted instead of using the snapshot which the compact action invoked. Currently, the compact/overwrite action only obeys SNAPSHOT isolation instead of SERIALIZABLE isolation. I am not sure if it is as expected? however, current implement of compaction is very dangerous in our scene.

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@Xiangakun Xiangakun added the bug Something isn't working label Dec 30, 2024
@Xiangakun
Copy link
Contributor Author

@JingsongLi @Zouxxyy Hello gentlemen, could you please help take a look at this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant