Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[flink] PartitionMarkDone enables the use of various partition trigge… #4386

Closed
wants to merge 3 commits into from

Conversation

Aitozi
Copy link
Contributor

@Aitozi Aitozi commented Oct 28, 2024

…r strategies.

Purpose

Linked issue: close #xxx

In our company, we have encountered an issue with the HMS partition statistic being incorrect. This is because during the writing process, we only update the metastore partition when it is first written to.
Therefore, we would like to implement the PartitionMarkDone strategy to update the statistics in HMS after a short idle period for each partition.
We need a separate configuration for PartitionMarkDone due to differing requirements:

  • We do not want to rely on extracting accurate timestamps from partition names.
  • We aim to store each partition's last access time in state so that it can be reported back to HMS.
    To achieve this, we plan on extending PartitionMarkDone functionality to support different trigger strategies for partitions. Following this extension, we will be able to introduce custom triggers such as PartitionHmsReporterTrigger.

Tests

API and Format

Documentation

@Aitozi
Copy link
Contributor Author

Aitozi commented Oct 28, 2024

WDYT? @JingsongLi

@JingsongLi
Copy link
Contributor

Hi @Aitozi , can you explain this pr from API level? Or you are modifying current behavior?

@Aitozi
Copy link
Contributor Author

Aitozi commented Oct 29, 2024

@JingsongLi The behavior is not change. In this PR, I extract the PartitionTrigger interface, The default is PartitionMarkDoneTrigger.

In StoreCommitter, it acts the PartitionCollector, it can trigger based on different config such as partition-mark-done or hms-report


public PartitionMarkDoneTrigger(
State state,
PartitionTimeExtractor timeExtractor,
@Nullable Duration timeInterval,
@Nullable Duration idleTime,
boolean markDoneWhenEndInput)
boolean markDoneWhenEndInput,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you keep PartitionMarkDone as it is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reset the commit, and updated in #4398

import java.util.List;

/** Partition collector. */
public class PartitionCollector implements Closeable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PartitionListeners?

import java.util.List;

/** The partition trigger. */
public interface PartitionTrigger extends Closeable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PartitionListener?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants