Skip to content

Commit

Permalink
Merge branch 'master' into non-equal-rf
Browse files Browse the repository at this point in the history
  • Loading branch information
englefly authored Oct 15, 2023
2 parents d4fea0c + 7ea456e commit 1c3bf3b
Show file tree
Hide file tree
Showing 604 changed files with 20,719 additions and 7,214 deletions.
18 changes: 6 additions & 12 deletions .clang-tidy
Original file line number Diff line number Diff line change
Expand Up @@ -4,21 +4,15 @@ Checks: |
clang-analyzer-*,
-*,
bugprone-redundant-branch-condition,
modernize-use-override,
modernize-use-equals-default,
modernize-use-equals-delete,
modernize-use-nodiscard,
modernize-use-nullptr,
modernize-use-bool-literals,
modernize-use-using,
modernize-*,
-modernize-use-trailing-return-type,
-modernize-use-nodiscard,
misc-redundant-expression,
misc-unused-*,
-misc-unused-parameters,
readability-make-member-function-const,
readability-non-const-parameter,
readability-static-accessed-through-instance,
readability-redundant-*,
readability-braces-around-statements,
readability-*,
-readability-identifier-length,
-readability-implicit-bool-conversion,
portability-simd-intrinsics,
performance-type-promotion-in-math-fn,
performance-faster-string-find,
Expand Down
8 changes: 8 additions & 0 deletions .github/workflows/clang-format.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,19 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: "Checkout ${{ github.ref }} ( ${{ github.sha }} )"
if: ${{ github.event_name != 'pull_request_target' }}
uses: actions/checkout@v3
with:
persist-credentials: false
submodules: recursive

- name: Checkout ${{ github.ref }} ( ${{ github.event.pull_request.head.sha }} )
if: ${{ github.event_name == 'pull_request_target' }}
uses: actions/checkout@v3
with:
ref: ${{ github.event.pull_request.head.sha }}
submodules: recursive

- name: Paths filter
uses: ./.github/actions/paths-filter
id: filter
Expand Down
13 changes: 12 additions & 1 deletion .github/workflows/license-eyes.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,18 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: "Checkout ${{ github.ref }} ( ${{ github.sha }} )"
uses: actions/checkout@v2
if: ${{ github.event_name != 'pull_request_target' }}
uses: actions/checkout@v3
with:
submodules: recursive

- name: Checkout ${{ github.ref }} ( ${{ github.event.pull_request.head.sha }} )
if: ${{ github.event_name == 'pull_request_target' }}
uses: actions/checkout@v3
with:
ref: ${{ github.event.pull_request.head.sha }}
submodules: recursive

- name: Check License
uses: apache/[email protected]
env:
Expand Down
1 change: 1 addition & 0 deletions .licenserc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ header:
- ".gitmodules"
- ".licenserc.yaml"
- ".rat-excludes"
- ".github/**"
- "be/src/common/status.cpp"
- "be/src/common/status.h"
- "be/src/env/env.h"
Expand Down
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,9 @@ Apache Doris is an easy-to-use, high-performance and real-time analytical databa

All this makes Apache Doris an ideal tool for scenarios including report analysis, ad-hoc query, unified data warehouse, and data lake query acceleration. On Apache Doris, users can build various applications, such as user behavior analysis, AB test platform, log retrieval analysis, user portrait analysis, and order analysis.

🎉 Version 2.0.1 version released now. The 2.0.1 version has achieved over 10x performance improvements on standard Benchmark, comprehensive enhancement in log analysis and lakehouse scenarios, more efficient and stable data update and write efficiency, support for more comprehensive multi-tenant and resource isolation mechanisms, and take a new step in the direction of resource elasticity and storage computing separation. It has also been added a series of usability features for enterprise users. We welcome all users who have requirements for the new features of the 2.0 version to deploy and upgrade. Check out the 🔗[Release Notes](https://github.com/apache/doris/issues/23640) here.
Doris Summit Asia 2023 is coming and warmly invite you to join! Click Now 🔗[doris-summit.org.cn](https://doris-summit.org.cn/?utm_source=website&utm_medium=readme&utm_campaign=2023&utm_id=2023)

🎉 Version 2.0.2 version released now. The 2.0.2 version has achieved over 10x performance improvements on standard Benchmark, comprehensive enhancement in log analysis and lakehouse scenarios, more efficient and stable data update and write efficiency, support for more comprehensive multi-tenant and resource isolation mechanisms, and take a new step in the direction of resource elasticity and storage computing separation. It has also been added a series of usability features for enterprise users. We welcome all users who have requirements for the new features of the 2.0 version to deploy and upgrade. Check out the 🔗[Release Notes](https://github.com/apache/doris/issues/25011) here.

🎉 Version 1.2.7 released now! It is fully evolved release and all users are encouraged to upgrade to this release. Check out the 🔗[Release Notes](https://doris.apache.org/docs/dev/releasenotes/release-1.2.7) here.

Expand Down
2 changes: 2 additions & 0 deletions be/cmake/thirdparty.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,8 @@ add_thirdparty(krb5)
add_thirdparty(com_err)
add_thirdparty(k5crypto)
add_thirdparty(gssapi_krb5)
add_thirdparty(dragonbox_to_chars LIB64)
target_include_directories(dragonbox_to_chars INTERFACE "${THIRDPARTY_DIR}/include/dragonbox-1.1.3")

if (OS_MACOSX)
add_thirdparty(bfd)
Expand Down
15 changes: 10 additions & 5 deletions be/src/common/config.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ DEFINE_mBool(disable_memory_gc, "false");

DEFINE_mInt64(large_memory_check_bytes, "2147483648");

// The maximum time a thread waits for a full GC. Currently only query will wait for full gc.
// The maximum time a thread waits for full GC. Currently only query will wait for full gc.
DEFINE_mInt32(thread_wait_gc_max_milliseconds, "1000");

DEFINE_mInt64(pre_serialize_keys_limit_bytes, "16777216");
Expand Down Expand Up @@ -378,6 +378,12 @@ DEFINE_mDouble(compaction_promotion_ratio, "0.05");
// rowset will be not given to base compaction. The unit is m byte.
DEFINE_mInt64(compaction_promotion_min_size_mbytes, "128");

// When output rowset of cumulative compaction total version count (end_version - start_version)
// exceed this config count, the rowset will be moved to base compaction
// NOTE: this config will work for unique key merge-on-write table only, to reduce version count
// related cost on delete bitmap more effectively.
DEFINE_mInt64(compaction_promotion_version_count, "1000");

// The lower bound size to do cumulative compaction. When total disk size of candidate rowsets is less than
// this size, size_based policy may not do to cumulative compaction. The unit is m byte.
DEFINE_mInt64(compaction_min_size_mbytes, "64");
Expand Down Expand Up @@ -1071,10 +1077,6 @@ DEFINE_mInt64(lookup_connection_cache_bytes_limit, "4294967296");
// level of compression when using LZ4_HC, whose defalut value is LZ4HC_CLEVEL_DEFAULT
DEFINE_mInt64(LZ4_HC_compression_level, "9");

DEFINE_Bool(enable_hdfs_hedged_read, "false");
DEFINE_Int32(hdfs_hedged_read_thread_num, "128");
DEFINE_Int32(hdfs_hedged_read_threshold_time, "500");

DEFINE_mBool(enable_merge_on_write_correctness_check, "true");

// The secure path with user files, used in the `local` table function.
Expand All @@ -1096,6 +1098,7 @@ DEFINE_Int32(group_commit_sync_wal_batch, "10");

// the count of thread to group commit insert
DEFINE_Int32(group_commit_insert_threads, "10");
DEFINE_mInt32(group_commit_interval_seconds, "10");

DEFINE_mInt32(scan_thread_nice_value, "0");
DEFINE_mInt32(tablet_schema_cache_recycle_interval, "86400");
Expand All @@ -1106,6 +1109,8 @@ DEFINE_Bool(exit_on_exception, "false");
DEFINE_String(doris_cgroup_cpu_path, "");
DEFINE_Bool(enable_cpu_hard_limit, "false");

DEFINE_Bool(ignore_always_true_predicate_for_segment, "true");

// clang-format off
#ifdef BE_TEST
// test s3
Expand Down
20 changes: 10 additions & 10 deletions be/src/common/config.h
Original file line number Diff line number Diff line change
Expand Up @@ -431,6 +431,12 @@ DECLARE_mDouble(compaction_promotion_ratio);
// rowset will be not given to base compaction. The unit is m byte.
DECLARE_mInt64(compaction_promotion_min_size_mbytes);

// When output rowset of cumulative compaction total version count (end_version - start_version)
// exceed this config count, the rowset will be moved to base compaction
// NOTE: this config will work for unique key merge-on-write table only, to reduce version count
// related cost on delete bitmap more effectively.
DECLARE_mInt64(compaction_promotion_version_count);

// The lower bound size to do cumulative compaction. When total disk size of candidate rowsets is less than
// this size, size_based policy may not do to cumulative compaction. The unit is m byte.
DECLARE_mInt64(compaction_min_size_mbytes);
Expand Down Expand Up @@ -1127,16 +1133,6 @@ DECLARE_mInt64(lookup_connection_cache_bytes_limit);
// level of compression when using LZ4_HC, whose defalut value is LZ4HC_CLEVEL_DEFAULT
DECLARE_mInt64(LZ4_HC_compression_level);

// whether to enable hdfs hedged read.
// If set to true, it will be enabled even if user not enable it when creating catalog
DECLARE_Bool(enable_hdfs_hedged_read);
// hdfs hedged read thread pool size, for "dfs.client.hedged.read.threadpool.size"
// Maybe overwritten by the value specified when creating catalog
DECLARE_Int32(hdfs_hedged_read_thread_num);
// the threshold of doing hedged read, for "dfs.client.hedged.read.threshold.millis"
// Maybe overwritten by the value specified when creating catalog
DECLARE_Int32(hdfs_hedged_read_threshold_time);

DECLARE_mBool(enable_merge_on_write_correctness_check);

// The secure path with user files, used in the `local` table function.
Expand Down Expand Up @@ -1166,6 +1162,7 @@ DECLARE_Int32(group_commit_sync_wal_batch);

// This config can be set to limit thread number in group commit insert thread pool.
DECLARE_mInt32(group_commit_insert_threads);
DECLARE_mInt32(group_commit_interval_seconds);

// The configuration item is used to lower the priority of the scanner thread,
// typically employed to ensure CPU scheduling for write operations.
Expand All @@ -1182,6 +1179,9 @@ DECLARE_mBool(exit_on_exception);
DECLARE_String(doris_cgroup_cpu_path);
DECLARE_Bool(enable_cpu_hard_limit);

// Remove predicate that is always true for a segment.
DECLARE_Bool(ignore_always_true_predicate_for_segment);

#ifdef BE_TEST
// test s3
DECLARE_String(test_s3_resource);
Expand Down
6 changes: 3 additions & 3 deletions be/src/common/daemon.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -229,7 +229,7 @@ void Daemon::memory_gc_thread() {
auto proc_mem_no_allocator_cache = doris::MemInfo::proc_mem_no_allocator_cache();

// GC excess memory for resource groups that not enable overcommit
auto tg_free_mem = doris::MemInfo::tg_hard_memory_limit_gc();
auto tg_free_mem = doris::MemInfo::tg_not_enable_overcommit_group_gc();
sys_mem_available += tg_free_mem;
proc_mem_no_allocator_cache -= tg_free_mem;

Expand All @@ -239,7 +239,7 @@ void Daemon::memory_gc_thread() {
// No longer full gc and minor gc during sleep.
memory_full_gc_sleep_time_ms = memory_gc_sleep_time_ms;
memory_minor_gc_sleep_time_ms = memory_gc_sleep_time_ms;
LOG(INFO) << fmt::format("Start Full GC, {}.",
LOG(INFO) << fmt::format("[MemoryGC] start full GC, {}.",
MemTrackerLimiter::process_limit_exceeded_errmsg_str());
doris::MemTrackerLimiter::print_log_process_usage();
if (doris::MemInfo::process_full_gc()) {
Expand All @@ -251,7 +251,7 @@ void Daemon::memory_gc_thread() {
proc_mem_no_allocator_cache >= doris::MemInfo::soft_mem_limit())) {
// No minor gc during sleep, but full gc is possible.
memory_minor_gc_sleep_time_ms = memory_gc_sleep_time_ms;
LOG(INFO) << fmt::format("Start Minor GC, {}.",
LOG(INFO) << fmt::format("[MemoryGC] start minor GC, {}.",
MemTrackerLimiter::process_soft_limit_exceeded_errmsg_str());
doris::MemTrackerLimiter::print_log_process_usage();
if (doris::MemInfo::process_minor_gc()) {
Expand Down
47 changes: 25 additions & 22 deletions be/src/common/stack_trace.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -49,14 +49,14 @@ namespace {
/// But we use atomic just in case, so it is possible to be modified at runtime.
std::atomic<bool> show_addresses = true;

#if defined(__ELF__) && !defined(__FreeBSD__)
void writePointerHex(const void* ptr, std::stringstream& buf) {
buf.write("0x", 2);
char hex_str[2 * sizeof(ptr)];
doris::vectorized::write_hex_uint_lowercase(reinterpret_cast<uintptr_t>(ptr), hex_str);
buf.write(hex_str, 2 * sizeof(ptr));
}
#endif
// #if defined(__ELF__) && !defined(__FreeBSD__)
// void writePointerHex(const void* ptr, std::stringstream& buf) {
// buf.write("0x", 2);
// char hex_str[2 * sizeof(ptr)];
// doris::vectorized::write_hex_uint_lowercase(reinterpret_cast<uintptr_t>(ptr), hex_str);
// buf.write(hex_str, 2 * sizeof(ptr));
// }
// #endif

bool shouldShowAddress(const void* addr) {
/// If the address is less than 4096, most likely it is a nullptr dereference with offset,
Expand Down Expand Up @@ -380,20 +380,15 @@ static void toStringEveryLineImpl([[maybe_unused]] const std::string dwarf_locat
reinterpret_cast<const void*>(uintptr_t(virtual_addr) - virtual_offset);

std::stringstream out;
out << "\t" << i << ". ";
out << "\t" << i << "# ";
if (i < 10) { // for alignment
out << " ";
}

if (shouldShowAddress(physical_addr)) {
out << "@ ";
writePointerHex(physical_addr, out);
}

if (const auto* const symbol = symbol_index.findSymbol(virtual_addr)) {
out << " " << collapseNames(demangle(symbol->name));
out << collapseNames(demangle(symbol->name));
} else {
out << " ?";
out << "?";
}

if (std::error_code ec; object && std::filesystem::exists(object->name, ec) && !ec) {
Expand All @@ -403,11 +398,17 @@ static void toStringEveryLineImpl([[maybe_unused]] const std::string dwarf_locat

if (dwarf_it->second.findAddress(uintptr_t(physical_addr), location, mode,
inline_frames)) {
out << " " << location.file.toString() << ":" << location.line;
out << " at " << location.file.toString() << ":" << location.line;
}
}

out << " in " << (object ? object->name : "?");
// Do not display the stack address and file name, it is not important.
// if (shouldShowAddress(physical_addr)) {
// out << " @ ";
// writePointerHex(physical_addr, out);
// }

// out << " in " << (object ? object->name : "?");

callback(out.str());

Expand Down Expand Up @@ -458,11 +459,13 @@ std::string toStringCached(const StackTrace::FramePointers& pointers, size_t off
}
}

std::string StackTrace::toString() const {
// Delete the first three frame pointers, which are inside the stacktrace.
std::string StackTrace::toString(int start_pointers_index) const {
// Default delete the first three frame pointers, which are inside the stack_trace.cpp.
start_pointers_index += 3;
StackTrace::FramePointers frame_pointers_raw {};
std::copy(frame_pointers.begin() + 3, frame_pointers.end(), frame_pointers_raw.begin());
return toStringCached(frame_pointers_raw, offset, size - 3);
std::copy(frame_pointers.begin() + start_pointers_index, frame_pointers.end(),
frame_pointers_raw.begin());
return toStringCached(frame_pointers_raw, offset, size - start_pointers_index);
}

std::string StackTrace::toString(void** frame_pointers_raw, size_t offset, size_t size) {
Expand Down
2 changes: 1 addition & 1 deletion be/src/common/stack_trace.h
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ class StackTrace {
[[nodiscard]] constexpr size_t getSize() const { return size; }
[[nodiscard]] constexpr size_t getOffset() const { return offset; }
[[nodiscard]] const FramePointers& getFramePointers() const { return frame_pointers; }
[[nodiscard]] std::string toString() const;
[[nodiscard]] std::string toString(int start_pointers_index = 0) const;

static std::string toString(void** frame_pointers, size_t offset, size_t size);
static void createCache();
Expand Down
6 changes: 4 additions & 2 deletions be/src/common/status.h
Original file line number Diff line number Diff line change
Expand Up @@ -280,6 +280,7 @@ E(ENTRY_NOT_FOUND, -6011);
constexpr bool capture_stacktrace(int code) {
return code != ErrorCode::OK
&& code != ErrorCode::END_OF_FILE
&& code != ErrorCode::DATA_QUALITY_ERROR
&& code != ErrorCode::MEM_LIMIT_EXCEEDED
&& code != ErrorCode::TRY_LOCK_FAILED
&& code != ErrorCode::TOO_MANY_SEGMENTS
Expand Down Expand Up @@ -377,7 +378,8 @@ class [[nodiscard]] Status {
}
#ifdef ENABLE_STACKTRACE
if constexpr (stacktrace && capture_stacktrace(code)) {
status._err_msg->_stack = get_stack_trace();
// Delete the first one frame pointers, which are inside the status.h
status._err_msg->_stack = get_stack_trace(1);
LOG(WARNING) << "meet error status: " << status; // may print too many stacks.
}
#endif
Expand All @@ -396,7 +398,7 @@ class [[nodiscard]] Status {
}
#ifdef ENABLE_STACKTRACE
if (stacktrace && capture_stacktrace(code)) {
status._err_msg->_stack = get_stack_trace();
status._err_msg->_stack = get_stack_trace(1);
LOG(WARNING) << "meet error status: " << status; // may print too many stacks.
}
#endif
Expand Down
3 changes: 2 additions & 1 deletion be/src/exec/es/es_scan_reader.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,7 @@ Status ESScanReader::open() {
}
_network_client.set_basic_auth(_user_name, _passwd);
_network_client.set_content_type("application/json");
_network_client.set_timeout_ms(_http_timeout_ms);
if (_use_ssl_client) {
_network_client.use_untrusted_ssl();
}
Expand Down Expand Up @@ -214,7 +215,7 @@ Status ESScanReader::close() {
_network_client.set_basic_auth(_user_name, _passwd);
_network_client.set_method(DELETE);
_network_client.set_content_type("application/json");
_network_client.set_timeout_ms(5 * 1000);
_network_client.set_timeout_ms(_http_timeout_ms);
if (_use_ssl_client) {
_network_client.use_untrusted_ssl();
}
Expand Down
25 changes: 13 additions & 12 deletions be/src/exprs/json_functions.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -254,18 +254,19 @@ Status JsonFunctions::extract_from_object(simdjson::ondemand::object& obj,
simdjson::ondemand::value* value) noexcept {
// Return DataQualityError when it's a malformed json.
// Otherwise the path was not found, due to array out of bound or not exist
#define HANDLE_SIMDJSON_ERROR(err, msg) \
do { \
const simdjson::error_code& _err = err; \
const std::string& _msg = msg; \
if (UNLIKELY(_err)) { \
if (_err == simdjson::NO_SUCH_FIELD || _err == simdjson::INDEX_OUT_OF_BOUNDS) { \
return Status::NotFound( \
fmt::format("err: {}, msg: {}", simdjson::error_message(_err), _msg)); \
} \
return Status::DataQualityError( \
fmt::format("err: {}, msg: {}", simdjson::error_message(_err), _msg)); \
} \
#define HANDLE_SIMDJSON_ERROR(err, msg) \
do { \
const simdjson::error_code& _err = err; \
const std::string& _msg = msg; \
if (UNLIKELY(_err)) { \
if (_err == simdjson::NO_SUCH_FIELD || _err == simdjson::INDEX_OUT_OF_BOUNDS) { \
return Status::DataQualityError( \
fmt::format("Not found target filed, err: {}, msg: {}", \
simdjson::error_message(_err), _msg)); \
} \
return Status::DataQualityError( \
fmt::format("err: {}, msg: {}", simdjson::error_message(_err), _msg)); \
} \
} while (false);

if (jsonpath.size() <= 1) {
Expand Down
Loading

0 comments on commit 1c3bf3b

Please sign in to comment.