Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Flink]add range strategy for sort compaction #2749

Merged
merged 5 commits into from
Mar 18, 2024

Conversation

wg1026688210
Copy link
Contributor

@wg1026688210 wg1026688210 commented Jan 20, 2024

Purpose

Improve the processing efficiency of sort compaction operations when there is data size skewness on range phase.

  1. add the size based rangestrategy config.
  2. add sample ratio config for per task and range num config.

Tests

SortCompactActionForUnawareBucketITCase
RangeShuffleTest
InternalRowToSizeVisitorTest

API and Format

add config
--table_conf sort-compaction.range-strategy= SIZE

Documentation

@wg1026688210 wg1026688210 changed the title [compaction]add size base range [Flink]add size base range strategy for sort compaction Jan 20, 2024
@wg1026688210 wg1026688210 force-pushed the compaction/size_base_range branch 5 times, most recently from 9eb0f3c to 63e3b69 Compare January 22, 2024 09:16
@wg1026688210 wg1026688210 changed the title [Flink]add size base range strategy for sort compaction [Flink]add sort compaction config Jan 22, 2024
@wg1026688210 wg1026688210 force-pushed the compaction/size_base_range branch from 2644b4a to 7960d3f Compare January 24, 2024 03:39
Copy link
Contributor

@leaves12138 leaves12138 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for @wg1026688210 a lot, just some comment of mine left. @JingsongLi please take a look at this pull request

} else {
callProcedure("zorder", columns);
callProcedure("zorder", rangeStrategy, columns);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need not to add rangStrategy in the parameters. We can find anywhere in the sort process.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, we can find in SortUtils.sortStreamByKey, Strategy strategy = table.coreOptions().sortRangeStrategy

@@ -287,14 +293,21 @@ private SortCompactAction createAction(String orderStrategy, List<String> column
"--order_strategy",
orderStrategy,
"--order_by",
String.join(",", columns));
String.join(",", columns),
"--sort_conf range_strategy=" + rangeStrategy,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add this as table-config, when we do compaction, we just --table_conf sort.range.strategy = xxx. By this way, we can simplify code and keep compatibility with old code.

double targetSize = (totalSize) / (double) (rangeNum);

@SuppressWarnings("unchecked")
T[] range = (T[]) new Object[rangeNum];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not rangeNum - 1? If we separate a array num to 1000 part, we should set 999 point?

@wg1026688210 wg1026688210 force-pushed the compaction/size_base_range branch 9 times, most recently from ed5de85 to e2a96b6 Compare January 29, 2024 14:40
@wg1026688210 wg1026688210 force-pushed the compaction/size_base_range branch from e2a96b6 to 89b1867 Compare January 29, 2024 14:56
@wg1026688210
Copy link
Contributor Author

@leaves12138 @JingsongLi PTAL

@wg1026688210 wg1026688210 force-pushed the compaction/size_base_range branch 2 times, most recently from 1a0dfa4 to dda4c63 Compare February 1, 2024 01:55
@wg1026688210 wg1026688210 force-pushed the compaction/size_base_range branch 2 times, most recently from 5398a22 to d2cbfb9 Compare February 7, 2024 02:09
@wg1026688210 wg1026688210 changed the title [Flink]add sort compaction config [Flink]add range strategy for sort compaction Feb 7, 2024
Copy link
Contributor

@leaves12138 leaves12138 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for @wg1026688210 , comment above

long x =
row.getTimestamp(index, localZonedTimestampType.getPrecision())
.getMillisecond();
return 8;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why define x, seems don't use it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@leaves12138 leaves12138 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@leaves12138 leaves12138 force-pushed the compaction/size_base_range branch from 21dc581 to ad2e117 Compare March 18, 2024 05:53
@leaves12138 leaves12138 merged commit f4d0521 into apache:master Mar 18, 2024
9 checks passed
zhu3pang pushed a commit to zhu3pang/incubator-paimon that referenced this pull request Mar 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants