Skip to content

Commit

Permalink
Merge branch 'master' into 20240712_fix_stack
Browse files Browse the repository at this point in the history
  • Loading branch information
xinyiZzz authored Jul 15, 2024
2 parents cfe3e3c + 6ec82ec commit 6d08a61
Show file tree
Hide file tree
Showing 146 changed files with 9,085 additions and 1,219 deletions.
90 changes: 66 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,32 +18,68 @@ under the License.
-->

<div align="center">
<img src="https://doris.apache.org/assets/images/home-banner-7f193353c932af31634eca0a028f03ed.png" align="right" height="240"/>
</div>

# Apache Doris

[![License](https://img.shields.io/badge/license-Apache%202-4EB1BA.svg)](https://www.apache.org/licenses/LICENSE-2.0.html)
[![GitHub release](https://img.shields.io/github/release/apache/doris.svg)](https://github.com/apache/doris/releases)
[![OSSRank](https://shields.io/endpoint?url=https://ossrank.com/shield/516)](https://ossrank.com/p/516)
[![Jenkins Vec](https://img.shields.io/jenkins/tests?compact_message&jobUrl=https://ci-builds.apache.org/job/Doris/job/doris_daily_enable_vectorized&label=VectorizedEngine)](https://ci-builds.apache.org/job/Doris/job/doris_daily_enable_vectorized)
[![Total Lines](https://tokei.rs/b1/github/apache/doris?category=lines)](https://github.com/apache/doris)
[![Join the Doris Community on Slack](https://join.slack.com/t/apachedoriscommunity/shared_invite/zt-2kl08hzc0-SPJe4VWmL_qzrFd2u2XYQA)
[![Total Line](https://img.shields.io/badge/Total_Line-GitHub-blue)]((https://github.com/apache/doris))
[![Join the chat at https://gitter.im/apache-doris/Lobby](https://badges.gitter.im/apache-doris/Lobby.svg)](https://gitter.im/apache-doris/Lobby?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
[![EN doc](https://img.shields.io/badge/Docs-English-blue.svg)](https://doris.apache.org/docs/get-starting/quick-start)
[![CN doc](https://img.shields.io/badge/文档-中文版-blue.svg)]([https://doris.apache.org/zh-CN/docs/dev/get-starting/what-is-apache-doris](https://doris.apache.org/zh-CN/docs/get-starting/what-is-apache-doris))
[![CN doc](https://img.shields.io/badge/文档-中文版-blue.svg)](https://doris.apache.org/zh-CN/docs/get-starting/quick-start/)



<div>


[![Official Website](<https://img.shields.io/badge/-Visit%20the%20Official%20Website%20%E2%86%92-rgb(15,214,106)?style=for-the-badge>)](https://doris.apache.org/)
[![Quick Download](<https://img.shields.io/badge/-Quick%20%20Download%20%E2%86%92-rgb(66,56,255)?style=for-the-badge>)](https://doris.apache.org/download)


</div>


<div>
<a href="https://twitter.com/doris_apache"><img src="https://img.shields.io/badge/- @Doris_Apache -424549?style=social&logo=x" height=25></a>
&nbsp;
<a href="https://github.com/apache/doris/discussions"><img src="https://img.shields.io/badge/- Discussion -red?style=social&logo=discourse" height=25></a>
&nbsp;
<a href="https://apachedoriscommunity.slack.com/join/shared_invite/zt-2kl08hzc0-SPJe4VWmL_qzrFd2u2XYQA"><img src="https://img.shields.io/badge/-Slack-red?style=social&logo=slack" height=25></a>
&nbsp;
<a href="https://medium.com/@ApacheDoris"><img src="https://img.shields.io/badge/-Medium-red?style=social&logo=medium" height=25></a>

</div>

</div>

---

Apache Doris is an MPP-based real-time data warehouse known for its high query speed. For queries on large datasets, it returns results in sub-seconds. It supports both high-concurrency point queries and high-throughput complex analysis. It can be used for report analysis, ad-hoc queries, unified data warehouse building, and data lake query acceleration. Based on Apache Doris, users can build applications for user behavior analysis, A/B testing platform, log analysis, and e-commerce order analysis.

Please visit our 🔗[official download page](https://doris.apache.org/download/) to get the latest release version.

The current stable version is the 2.0.x series, and the latest version is the 2.1.x series. For production, it is recommended to use the latest version of the 2.0.x series. And if used for POC or testing, it is recommended to use the latest version of the 2.1.x series.

Apache Doris is an easy-to-use, high-performance and real-time analytical database based on MPP architecture, known for its extreme speed and ease of use. It only requires a sub-second response time to return query results under massive data and can support not only high-concurrent point query scenarios but also high-throughput complex analysis scenarios.

All this makes Apache Doris an ideal tool for scenarios including report analysis, ad-hoc query, unified data warehouse, and data lake query acceleration. On Apache Doris, users can build various applications, such as user behavior analysis, AB test platform, log retrieval analysis, user portrait analysis, and order analysis.

🎉 Version 2.1.0 released now. Check out the 🔗[Release Notes](https://doris.apache.org/docs/releasenotes/release-2.1.0) here. The 2.1 verison delivers exceptional performance with 100% higher out-of-the-box queries proven by TPC-DS 1TB tests, enhanced data lake analytics that are 4-6 times speedier than Trino and Spark, solid support for semi-structured data analysis with new Variant types and suite of analytical functions, asynchronous materialized views for query acceleration, optimized real-time writing at scale, and better workload management with stability and runtime SQL resource tracking.


🎉 Version 2.0.6 is now released ! This fully evolved and stable release is ready for all users to upgrade. Check out the 🔗[Release Notes](https://doris.apache.org/docs/releasenotes/release-2.0.6) here.

👀 Have a look at the 🔗[Official Website](https://doris.apache.org/) for a comprehensive list of Apache Doris's core features, blogs and user cases.

## 📈 Usage Scenarios

As shown in the figure below, after various data integration and processing, the data sources are usually stored in the real-time data warehouse Apache Doris and the offline data lake or data warehouse (in Apache Hive, Apache Iceberg or Apache Hudi).

<img src="https://dev-to-uploads.s3.amazonaws.com/uploads/articles/sekvbs5ih5rb16wz6n9k.png">
<br />

<img src="https://cdn.selectdb.com/static/What_is_Apache_Doris_3_a61692c2ce.png" />

<br />

Apache Doris is widely used in the following scenarios:

Expand Down Expand Up @@ -71,7 +107,11 @@ The overall architecture of Apache Doris is shown in the following figure. The D

Both types of processes are horizontally scalable, and a single cluster can support up to hundreds of machines and tens of petabytes of storage capacity. And these two types of processes guarantee high availability of services and high reliability of data through consistency protocols. This highly integrated architecture design greatly reduces the operation and maintenance cost of a distributed system.

![The overall architecture of Apache Doris](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/mnz20ae3s23vv3e9ltmi.png)
<br />

![The overall architecture of Apache Doris](https://cdn.selectdb.com/static/What_is_Apache_Doris_adb26397e2.png)

<br />

In terms of interfaces, Apache Doris adopts MySQL protocol, supports standard SQL, and is highly compatible with MySQL dialect. Users can access Doris through various client tools and it supports seamless connection with BI tools.

Expand Down Expand Up @@ -101,24 +141,28 @@ Doris also supports strongly consistent materialized views. Materialized views a

Doris adopts the MPP model in its query engine to realize parallel execution between and within nodes. It also supports distributed shuffle join for multiple large tables so as to handle complex queries.

![](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/vjlmumwyx728uymsgcw0.png)
<br />

![Query Engine](https://cdn.selectdb.com/static/What_is_Apache_Doris_1_c6f5ba2af9.png)

<br />

The Doris query engine is vectorized, with all memory structures laid out in a columnar format. This can largely reduce virtual function calls, improve cache hit rates, and make efficient use of SIMD instructions. Doris delivers a 5–10 times higher performance in wide table aggregation scenarios than non-vectorized engines.

![](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ck2m3kbnodn28t28vphp.png)
<br />

Apache Doris uses Adaptive Query Execution technology to dynamically adjust the execution plan based on runtime statistics. For example, it can generate runtime filter, push it to the probe side, and automatically penetrate it to the Scan node at the bottom, which drastically reduces the amount of data in the probe and increases join performance. The runtime filter in Doris supports In/Min/Max/Bloom filter.
![Doris query engine](https://cdn.selectdb.com/static/What_is_Apache_Doris_2_29cf58cc6b.png)

### 🚅 Query Optimizer
<br />

In terms of optimizers, Doris uses a combination of CBO and RBO. RBO supports constant folding, subquery rewriting, predicate pushdown. The Doris CBO is under continuous optimization for more accurate statistical information collection and derivation, and more accurate cost model prediction.
Apache Doris uses Adaptive Query Execution technology to dynamically adjust the execution plan based on runtime statistics. For example, it can generate runtime filter, push it to the probe side, and automatically penetrate it to the Scan node at the bottom, which drastically reduces the amount of data in the probe and increases join performance. The runtime filter in Doris supports In/Min/Max/Bloom filter.

The query optimizer in V2.0 has a richer statistical base and adopts the Cascades framework. It is capable of self-tuning in most query scenarios and supports all 99 SQLs in TPC-DS, so users can expect high performance without any fine-tuning or SQL rewriting.
### 🚅 Query Optimizer

The query optimizer in V2.1 comes with enhanced statistics-based inference and enumeration framework. We have upgraded the cost model and expanded the optimization rules to serve the needs of more use cases.
In terms of optimizers, Doris uses a combination of CBO and RBO. RBO supports constant folding, subquery rewriting, predicate pushdown and CBO supports Join Reorder. The Doris CBO is under continuous optimization for more accurate statistical information collection and derivation, and more accurate cost model prediction.


**Technical Overview**: 🔗[Introduction to Apache Doris](https://doris.apache.org/docs/get-starting/what-is-apache-doris)
**Technical Overview**: 🔗[Introduction to Apache Doris](https://doris.apache.org/docs/dev/summary/basic-summary)

## 🎆 Why choose Apache Doris?

Expand All @@ -138,7 +182,7 @@ The query optimizer in V2.1 comes with enhanced statistics-based inference and e

**Apache Doris has graduated from Apache incubator successfully and become a Top-Level Project in June 2022**.

Currently, the Apache Doris community has gathered more than 600 contributors from over 200 companies in different industries, and the number of monthly active contributors exceeds 120.
Currently, the Apache Doris community has gathered more than 400 contributors from nearly 200 companies in different industries, and the number of active contributors is close to 100 per month.


[![Monthly Active Contributors](https://contributor-overtime-api.apiseven.com/contributors-svg?chart=contributorMonthlyActivity&repo=apache/doris)](https://www.apiseven.com/en/contributor-graph?chart=contributorMonthlyActivity&repo=apache/doris)
Expand All @@ -161,19 +205,17 @@ Add your company logo at Apache Doris Website: 🔗[Add Your Company](https://gi

All Documentation 🔗[Docs](https://doris.apache.org/docs/get-starting/quick-start)

Documentation Repo 🔗[Docs Repo](https://github.com/apache/doris-website)

### ⬇️ Download

All release and binary version 🔗[Download](https://doris.apache.org/download)

### 🗄️ Compile

See how to compile 🔗[Compilation](https://doris.apache.org/docs/install/source-install/compilation-with-docker)
See how to compile 🔗[Compilation](https://doris.apache.org/docs/dev/install/source-install/compilation-general)

### 📮 Install

See how to install and deploy 🔗[Installation and deployment](https://doris.apache.org/docs/install/cluster-deployment/standard-deployment)
See how to install and deploy 🔗[Installation and deployment](https://doris.apache.org/docs/dev/install/standard-deployment)

## 🧩 Components

Expand Down Expand Up @@ -219,7 +261,7 @@ Contact us through the following mailing list.

* Apache Doris Official Website - [Site](https://doris.apache.org)
* Developer Mailing list - <[email protected]>. Mail to <[email protected]>, follow the reply to subscribe the mail list.
* Slack channel - [Join the Slack](https://join.slack.com/t/apachedoriscommunity/shared_invite/zt-2kl08hzc0-SPJe4VWmL_qzrFd2u2XYQA)
* Slack channel - [Join the Slack](https://join.slack.com/t/apachedoriscommunity/shared_invite/zt-28il1o2wk-DD6LsLOz3v4aD92Mu0S0aQ)
* Twitter - [Follow @doris_apache](https://twitter.com/doris_apache)


Expand Down
6 changes: 4 additions & 2 deletions be/src/agent/be_exec_version_manager.h
Original file line number Diff line number Diff line change
Expand Up @@ -79,13 +79,15 @@ class BeExecVersionManager {
* a. change the impl of percentile (need fix)
* b. clear old version of version 3->4
* c. change FunctionIsIPAddressInRange from AlwaysNotNullable to DependOnArguments
* d. change some agg function nullable property: PR #37215
*/
constexpr inline int BeExecVersionManager::max_be_exec_version = 5;
constexpr inline int BeExecVersionManager::min_be_exec_version = 0;

/// functional
constexpr inline int BITMAP_SERDE = 3;
constexpr inline int USE_NEW_SERDE = 4; // release on DORIS version 2.1
constexpr inline int OLD_WAL_SERDE = 3; // use to solve compatibility issues, see pr #32299
constexpr inline int USE_NEW_SERDE = 4; // release on DORIS version 2.1
constexpr inline int OLD_WAL_SERDE = 3; // use to solve compatibility issues, see pr #32299
constexpr inline int AGG_FUNCTION_NULLABLE = 5; // change some agg nullable property: PR #37215

} // namespace doris
34 changes: 32 additions & 2 deletions be/src/cloud/cloud_engine_calc_delete_bitmap_task.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -69,9 +69,18 @@ Status CloudEngineCalcDeleteBitmapTask::execute() {

for (const auto& partition : _cal_delete_bitmap_req.partitions) {
int64_t version = partition.version;
for (auto tablet_id : partition.tablet_ids) {
bool has_compaction_stats = partition.__isset.base_compaction_cnts &&
partition.__isset.cumulative_compaction_cnts &&
partition.__isset.cumulative_points;
for (size_t i = 0; i < partition.tablet_ids.size(); i++) {
auto tablet_id = partition.tablet_ids[i];
auto tablet_calc_delete_bitmap_ptr = std::make_shared<CloudTabletCalcDeleteBitmapTask>(
_engine, this, tablet_id, transaction_id, version);
if (has_compaction_stats) {
tablet_calc_delete_bitmap_ptr->set_compaction_stats(
partition.base_compaction_cnts[i], partition.cumulative_compaction_cnts[i],
partition.cumulative_points[i]);
}
auto submit_st = token->submit_func([=]() {
auto st = tablet_calc_delete_bitmap_ptr->handle();
if (!st.ok()) {
Expand Down Expand Up @@ -107,6 +116,14 @@ CloudTabletCalcDeleteBitmapTask::CloudTabletCalcDeleteBitmapTask(
fmt::format("CloudTabletCalcDeleteBitmapTask#_transaction_id={}", _transaction_id));
}

void CloudTabletCalcDeleteBitmapTask::set_compaction_stats(int64_t ms_base_compaction_cnt,
int64_t ms_cumulative_compaction_cnt,
int64_t ms_cumulative_point) {
_ms_base_compaction_cnt = ms_base_compaction_cnt;
_ms_cumulative_compaction_cnt = ms_base_compaction_cnt;
_ms_cumulative_point = ms_cumulative_point;
}

Status CloudTabletCalcDeleteBitmapTask::handle() const {
SCOPED_ATTACH_TASK(_mem_tracker);
int64_t t1 = MonotonicMicros();
Expand All @@ -122,7 +139,20 @@ Status CloudTabletCalcDeleteBitmapTask::handle() const {
}
int64_t max_version = tablet->max_version_unlocked();
int64_t t2 = MonotonicMicros();
if (_version != max_version + 1) {

auto should_sync_rowsets_produced_by_compaction = [&]() {
if (_ms_base_compaction_cnt == -1) {
return true;
}

// some compaction jobs finished on other BEs during this load job
// we should sync rowsets and their delete bitmaps produced by compaction jobs
std::shared_lock rlock(tablet->get_header_lock());
return _ms_base_compaction_cnt > tablet->base_compaction_cnt() ||
_ms_cumulative_compaction_cnt > tablet->cumulative_compaction_cnt() ||
_ms_cumulative_point > tablet->cumulative_layer_point();
};
if (_version != max_version + 1 || should_sync_rowsets_produced_by_compaction()) {
auto sync_st = tablet->sync_rowsets();
if (sync_st.is<ErrorCode::INVALID_TABLET_STATE>()) [[unlikely]] {
_engine_calc_delete_bitmap_task->add_succ_tablet_id(_tablet_id);
Expand Down
7 changes: 7 additions & 0 deletions be/src/cloud/cloud_engine_calc_delete_bitmap_task.h
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,9 @@ class CloudTabletCalcDeleteBitmapTask {
int64_t transaction_id, int64_t version);
~CloudTabletCalcDeleteBitmapTask() = default;

void set_compaction_stats(int64_t ms_base_compaction_cnt, int64_t ms_cumulative_compaction_cnt,
int64_t ms_cumulative_point);

Status handle() const;

private:
Expand All @@ -46,6 +49,10 @@ class CloudTabletCalcDeleteBitmapTask {
int64_t _tablet_id;
int64_t _transaction_id;
int64_t _version;

int64_t _ms_base_compaction_cnt {-1};
int64_t _ms_cumulative_compaction_cnt {-1};
int64_t _ms_cumulative_point {-1};
std::shared_ptr<MemTrackerLimiter> _mem_tracker;
};

Expand Down
2 changes: 1 addition & 1 deletion be/src/common/config.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -324,7 +324,7 @@ DEFINE_mInt32(garbage_sweep_batch_size, "100");
DEFINE_mInt32(snapshot_expire_time_sec, "172800");
// It is only a recommended value. When the disk space is insufficient,
// the file storage period under trash dose not have to comply with this parameter.
DEFINE_mInt32(trash_file_expire_time_sec, "86400");
DEFINE_mInt32(trash_file_expire_time_sec, "0");
// minimum file descriptor number
// modify them upon necessity
DEFINE_Int32(min_file_descriptor_number, "60000");
Expand Down
14 changes: 6 additions & 8 deletions be/src/olap/base_tablet.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -829,29 +829,27 @@ Status BaseTablet::sort_block(vectorized::Block& in_block, vectorized::Block& ou
vectorized::MutableBlock mutable_output_block =
vectorized::MutableBlock::build_mutable_block(&output_block);

std::vector<RowInBlock*> _row_in_blocks;
_row_in_blocks.reserve(in_block.rows());

std::shared_ptr<RowInBlockComparator> vec_row_comparator =
std::make_shared<RowInBlockComparator>(_tablet_meta->tablet_schema().get());
vec_row_comparator->set_block(&mutable_input_block);

std::vector<RowInBlock*> row_in_blocks;
std::vector<std::unique_ptr<RowInBlock>> row_in_blocks;
DCHECK(in_block.rows() <= std::numeric_limits<int>::max());
row_in_blocks.reserve(in_block.rows());
for (size_t i = 0; i < in_block.rows(); ++i) {
row_in_blocks.emplace_back(new RowInBlock {i});
row_in_blocks.emplace_back(std::make_unique<RowInBlock>(i));
}
std::sort(row_in_blocks.begin(), row_in_blocks.end(),
[&](const RowInBlock* l, const RowInBlock* r) -> bool {
auto value = (*vec_row_comparator)(l, r);
[&](const std::unique_ptr<RowInBlock>& l,
const std::unique_ptr<RowInBlock>& r) -> bool {
auto value = (*vec_row_comparator)(l.get(), r.get());
DCHECK(value != 0) << "value equel when sort block, l_pos: " << l->_row_pos
<< " r_pos: " << r->_row_pos;
return value < 0;
});
std::vector<uint32_t> row_pos_vec;
row_pos_vec.reserve(in_block.rows());
for (auto* block : row_in_blocks) {
for (auto& block : row_in_blocks) {
row_pos_vec.emplace_back(block->_row_pos);
}
return mutable_output_block.add_rows(&in_block, row_pos_vec.data(),
Expand Down
Loading

0 comments on commit 6d08a61

Please sign in to comment.