[Bug] HashBucketAssigner load index lead to `Too large` error #3776

izhangzhihao · 2024-07-18T08:56:27Z

Search before asking

I searched in the issues and found nothing similar.

Paimon version

0.9.0

Compute Engine

flink 1.17.2

Minimal reproduce step

CREATE TABLE if not exists a_one_billion_table (
    id STRING,
    ... ...
    PRIMARY KEY (id) NOT ENFORCED
) WITH ('bucket' = '-1');

# make sure you already inserted more then 1 billion data into table `a_one_billion_table`
# then run the blow new filnk job, `source_table` may only have 10000 records, then the checkpoint will fail.

insert into a_one_billion_table
select * from source_table;

What doesn't meet your expectations?

checkpoint failed with error:

2024-07-18 16:09:32,895 WARN  org.apache.flink.runtime.taskmanager.Task [] - dynamic-bucket-assigner (1/1)#0 switched from RUNNING to FAILED with failure cause:
java.lang.IllegalArgumentException: Too large (1466616922 expected elements with load factor 0.75)
	at org.apache.paimon.shade.it.unimi.dsi.fastutil.HashCommon.arraySize(HashCommon.java:208)
	at org.apache.paimon.shade.it.unimi.dsi.fastutil.ints.Int2ShortOpenHashMap.<init>(Int2ShortOpenHashMap.java:103)
	at org.apache.paimon.shade.it.unimi.dsi.fastutil.ints.Int2ShortOpenHashMap.<init>(Int2ShortOpenHashMap.java:116)
	at org.apache.paimon.utils.Int2ShortHashMap.<init>(Int2ShortHashMap.java:35)
	at org.apache.paimon.utils.Int2ShortHashMap$Builder.build(Int2ShortHashMap.java:70)
	at org.apache.paimon.index.PartitionIndex.loadIndex(PartitionIndex.java:138)
	at org.apache.paimon.index.HashBucketAssigner.loadIndex(HashBucketAssigner.java:166)
	at org.apache.paimon.index.HashBucketAssigner.assign(HashBucketAssigner.java:83)
	at org.apache.paimon.flink.sink.HashBucketAssignerOperator.processElement(HashBucketAssignerOperator.java:98)
	at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:246)
	at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.processElement(AbstractStreamTaskNetworkInput.java:217)
	at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:169)
	at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:68)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:616)
	at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:231)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:1080)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:1029)
	at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:959)
	at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:938)
	at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:751)
	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:567)
	at java.lang.Thread.run(Thread.java:879) [?:1.8.0_372]

Anything else?

No response

Are you willing to submit a PR?

I'm willing to submit a PR!

The text was updated successfully, but these errors were encountered:

izhangzhihao · 2024-08-07T03:45:24Z

pls reopen this ticket, this issue is still not resolved. adding parallelism is a workaround. Ideally, the resource allocation for the task should match the data flow, rather than matching the total amount of data at rest. The current situation does not meet this ideal. Is there any plan for optimization?

izhangzhihao added the bug Something isn't working label Jul 18, 2024

xuzifu666 mentioned this issue Jul 22, 2024

[core] Fix HashBucketAssigner load index too large error with refactor exception #3796

Merged

JingsongLi closed this as completed Aug 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] HashBucketAssigner load index lead to `Too large` error #3776

[Bug] HashBucketAssigner load index lead to `Too large` error #3776

izhangzhihao commented Jul 18, 2024 •

edited

Loading

izhangzhihao commented Aug 7, 2024

[Bug] HashBucketAssigner load index lead to Too large error #3776

[Bug] HashBucketAssigner load index lead to Too large error #3776

Comments

izhangzhihao commented Jul 18, 2024 • edited Loading

Search before asking

Paimon version

Compute Engine

Minimal reproduce step

What doesn't meet your expectations?

Anything else?

Are you willing to submit a PR?

izhangzhihao commented Aug 7, 2024

[Bug] HashBucketAssigner load index lead to `Too large` error #3776

[Bug] HashBucketAssigner load index lead to `Too large` error #3776

izhangzhihao commented Jul 18, 2024 •

edited

Loading