Skip to content

Commit

Permalink
[Optimize](Variant) optimize schema update performance
Browse files Browse the repository at this point in the history
When update schema with high concurrency, updaing schemas cost is expensive.
1. update schema only when rows is not 0
2. copy_from is expensive, use copy constructor
  • Loading branch information
eldenmoon committed Dec 16, 2024
1 parent cf6dba7 commit 8930824
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 11 deletions.
19 changes: 10 additions & 9 deletions be/src/olap/rowset_builder.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -346,21 +346,22 @@ Status RowsetBuilder::commit_txn() {
SCOPED_TIMER(_commit_txn_timer);

const RowsetWriterContext& rw_ctx = _rowset_writer->context();
if (rw_ctx.tablet_schema->num_variant_columns() > 0) {
if (rw_ctx.tablet_schema->num_variant_columns() > 0 && _rowset->num_rows() > 0) {
// Need to merge schema with `rw_ctx.merged_tablet_schema` in prior,
// merged schema keeps the newest merged schema for the rowset, which is updated and merged
// during flushing segments.
if (rw_ctx.merged_tablet_schema != nullptr) {
RETURN_IF_ERROR(tablet()->update_by_least_common_schema(rw_ctx.merged_tablet_schema));
} else {
// We should merge rowset schema further, in case that the merged_tablet_schema maybe null
// when enable_memtable_on_sink_node is true, the merged_tablet_schema will not be passed to
// the destination backend.
// update tablet schema when meet variant columns, before commit_txn
// Eg. rowset schema: A(int), B(float), C(int), D(int)
// _tabelt->tablet_schema: A(bigint), B(double)
// => update_schema: A(bigint), B(double), C(int), D(int)
RETURN_IF_ERROR(tablet()->update_by_least_common_schema(rw_ctx.tablet_schema));
}
// We should merge rowset schema further, in case that the merged_tablet_schema maybe null
// when enable_memtable_on_sink_node is true, the merged_tablet_schema will not be passed to
// the destination backend.
// update tablet schema when meet variant columns, before commit_txn
// Eg. rowset schema: A(int), B(float), C(int), D(int)
// _tabelt->tablet_schema: A(bigint), B(double)
// => update_schema: A(bigint), B(double), C(int), D(int)
RETURN_IF_ERROR(tablet()->update_by_least_common_schema(rw_ctx.tablet_schema));
}

// Transfer ownership of `PendingRowsetGuard` to `TxnManager`
Expand Down
3 changes: 1 addition & 2 deletions be/src/vec/common/schema_util.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -415,8 +415,7 @@ Status get_least_common_schema(const std::vector<TabletSchemaSPtr>& schemas,
// Ensure that the output schema also excludes these extracted columns. This approach prevents
// duplicated paths following the update_least_common_schema process.
auto build_schema_without_extracted_columns = [&](const TabletSchemaSPtr& base_schema) {
output_schema = std::make_shared<TabletSchema>();
output_schema->copy_from(*base_schema);
output_schema = std::make_shared<TabletSchema>(*base_schema);
// Merge columns from other schemas
output_schema->clear_columns();
// Get all columns without extracted columns and collect variant col unique id
Expand Down

0 comments on commit 8930824

Please sign in to comment.