Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Support delete stats in result of scan plan. #4506

Conversation

wwj6591812
Copy link
Contributor

Purpose

In my company's production environment, when use Flink session cluster for OLAP scan Paimon, we found the JobManager's memory is always heavy.
So, we will optimize this by two ways:
(1) Delete stats in DataSplit.
(2) When dataSkipping, cut unused stats in ManifestEntry.

This pr is for (1)

Linked issue: close #xxx

Tests

API and Format

Documentation

@wwj6591812 wwj6591812 force-pushed the support_delete_stats_in_result_of_plan_1112 branch from a0d5f66 to 1c5dbff Compare November 12, 2024 15:41
@wwj6591812 wwj6591812 changed the title [core] support delete stats in result of scan plan [WIP][core] support delete stats in result of scan plan Nov 13, 2024
@wwj6591812 wwj6591812 force-pushed the support_delete_stats_in_result_of_plan_1112 branch 2 times, most recently from 3bc538e to 4e6a4f0 Compare November 13, 2024 09:57
@wwj6591812 wwj6591812 changed the title [WIP][core] support delete stats in result of scan plan [core] support delete stats in result of scan plan Nov 13, 2024
@wwj6591812 wwj6591812 changed the title [core] support delete stats in result of scan plan [core] Support delete stats in result of scan plan. Nov 15, 2024
@wwj6591812
Copy link
Contributor Author

@JingsongLi Hi,Please CC, Thx.

@@ -409,6 +409,27 @@ public DataFileMeta rename(String newFileName) {
valueStatsCols);
}

public DataFileMeta withoutStats() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copyWithoutStats

rowCount,
minKey,
maxKey,
EMPTY_STATS,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to keep key stats?

deleteRowCount,
embeddedIndex,
fileSource,
valueStatsCols);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

empty list

@@ -121,6 +121,10 @@ public Identifier identifier() {
file.embeddedIndex());
}

public ManifestEntry withoutStats() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copyWithoutStats

@@ -215,6 +217,12 @@ public FileStoreScan withMetrics(ScanMetrics metrics) {
return this;
}

@Override
public FileStoreScan withoutStatsInPlan() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dropStats

@@ -81,6 +81,8 @@ public interface FileStoreScan {

FileStoreScan withMetrics(ScanMetrics metrics);

FileStoreScan withoutStatsInPlan();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dropStats

@@ -102,6 +102,12 @@ public AbstractDataTableScan withMetricsRegistry(MetricRegistry metricsRegistry)
return this;
}

@Override
public AbstractDataTableScan withoutStatsInPlan() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dropStats

@@ -55,4 +55,9 @@ default InnerTableScan withMetricsRegistry(MetricRegistry metricRegistry) {
// do nothing, should implement this if need
return this;
}

default InnerTableScan withoutStatsInPlan() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dropStats

@@ -150,6 +150,9 @@ default ReadBuilder withProjection(int[][] projection) {
*/
ReadBuilder withShard(int indexOfThisSubtask, int numberOfParallelSubtasks);

/** Delete stats in scan plan result. */
ReadBuilder withoutStatsInPlan();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dropStats

@wwj6591812 wwj6591812 force-pushed the support_delete_stats_in_result_of_plan_1112 branch from 4e6a4f0 to 1783216 Compare November 18, 2024 06:18
@wwj6591812 wwj6591812 force-pushed the support_delete_stats_in_result_of_plan_1112 branch from 1783216 to 2f53687 Compare November 18, 2024 06:24
@wwj6591812
Copy link
Contributor Author

@JingsongLi
had addressed, Thanks for review.

Copy link
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@JingsongLi JingsongLi merged commit 220789d into apache:master Nov 18, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants