-
Notifications
You must be signed in to change notification settings - Fork 988
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Job cannot recover from checkpoint/savepoint if parallelism is changed from 1 to 2 #4543
Comments
Lines 69 to 83 in 220789d
If the job parallelism is 1, the Writer operator and the Compact Coordinator operator will be chained. However, since the parallelism of the Compact Coordinator operator is always 1, when the job parallelism is adjusted, the Writer operator and the Compact Coordinator operator will be separated, resulting in the state not being recoverable. We need to disable chain between writer operator and compact corrdinator operator. |
paimon/paimon-flink/paimon-flink-common/src/main/java/org/apache/paimon/flink/sink/FlinkSink.java Lines 243 to 256 in 220789d
Same as org.apache.paimon.flink.sink.FlinkSink#doWrite |
@JingsongLi WDYT? Should we add an option to control this? Like #3232. |
I think #4424 can solve this problem. But do we still need to add the disable chain? Or do we just need to recommend that users add the 'sink.operator-uid.suffix' and 'source.operator-uid.suffix' options to their Flink job? |
@Tan-JiaLiang |
Search before asking
Paimon version
0.9.0
Compute Engine
Flink
Minimal reproduce step
What doesn't meet your expectations?
Job can restore from checkpoint/savepoint even if I change the parallelism.
Anything else?
No response
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: