Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] spark deletes dynamic bucket table, the result is incorrect #2471

Closed
1 of 2 tasks
hekaifei opened this issue Dec 8, 2023 · 5 comments · Fixed by #2945
Closed
1 of 2 tasks

[Bug] spark deletes dynamic bucket table, the result is incorrect #2471

hekaifei opened this issue Dec 8, 2023 · 5 comments · Fixed by #2945
Labels
bug Something isn't working

Comments

@hekaifei
Copy link
Contributor

hekaifei commented Dec 8, 2023

Search before asking

  • I searched in the issues and found nothing similar.

Paimon version

0.7

Compute Engine

spark

Minimal reproduce step

image

What doesn't meet your expectations?

For dynamic bucket tables, flink does not support delet. But spark can do it, but the result is incorrect.
image
I think the error should be reported in advance or it can be repaired and the result will be correct.

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@hekaifei hekaifei added the bug Something isn't working label Dec 8, 2023
@Zouxxyy
Copy link
Contributor

Zouxxyy commented Dec 8, 2023

@hekaifei Thank you for pointing it out, can you provide more detailed steps to reproduce it, such as table creation and data insert statement

@hekaifei
Copy link
Contributor Author

hekaifei commented Dec 8, 2023

@Zouxxyy

  1. create table , I use the hive client
CREATE TABLE dynamic_bucket_tests (
id BIGINT,
job_group BIGINT,
job_id BIGINT ,
ums_ts_ BIGINT)  COMMENT ''  
STORED BY 'org.apache.paimon.hive.PaimonStorageHandler'  
TBLPROPERTIES(
 'bucket' = '-1',
 'consumer.expiration-time' = '1h', 
'file.format' = 'parquet', 
'metastore.partitioned-table' = 'true', 
'primary-key' = 'id', 
'sequence.field' = 'ums_ts_');
  1. insert data , I use spark sql
spark.read.json("dynamic_bucket_tests.json").repartition(10).createOrReplaceTempView("t1") 
spark.sql("""insert into dynamic_bucket_tests select * from t1""")

dynamic_bucket_tests.json

  1. try to delete
spark-sql> delete from dynamic_bucket_tests  where id  =55233981; Time taken: 11.076 seconds    
spark-sql>  > select * from dynamic_bucket_tests  where id  =55233981; Time taken: 3.362 seconds 
spark-sql> delete from dynamic_bucket_tests  where id  =55235391; Time taken: 7.803 seconds
spark-sql> select * from dynamic_bucket_tests  where id  =55235391; 
                   55235391        65      59662   16995456000000 
                 Time taken: 2.226 seconds, Fetched 1 row(s)

@hekaifei
Copy link
Contributor Author

hekaifei commented Dec 8, 2023

@Zouxxyy Maybe it's a paimon version issue. I used paimon0.6-spark3.2 to write data and paimon0.7-spark3.2 for deletion.

@Zouxxyy
Copy link
Contributor

Zouxxyy commented Dec 11, 2023

@hekaifei Thanks, there are some bug between 'sequence.field' and row lever op, I'll take it

@hekaifei
Copy link
Contributor Author

@Zouxxyy This problem also occurs when update is found

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants