Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

branch-2.1: [enhance](orc) Optimize ORC Predicate Pushdown for OR-connected Predicate #43255 #44438

Merged
merged 1 commit into from
Nov 22, 2024

Conversation

github-actions[bot]
Copy link
Contributor

Cherry-picked from #43255

…cate (#43255)

### What problem does this PR solve?

Problem Summary:
This issue addresses a limitation in Apache Doris where only predicates
joined by AND are pushed down to the ORC reader, leaving OR-connected
predicates unoptimized. By extending pushdown functionality to handle
these OR conditions, the aim is to better leverage ORC’s predicate
pushdown capabilities, reducing data reads and improving query
performance.
@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@dataroaring dataroaring reopened this Nov 22, 2024
@doris-robot
Copy link

run buildall

Copy link
Contributor Author

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@@ -461,8 +467,10 @@ static std::unordered_map<orc::TypeKind, orc::PredicateDataType> TYPEKIND_TO_PRE
{orc::TypeKind::BOOLEAN, orc::PredicateDataType::BOOLEAN}};

template <PrimitiveType primitive_type>
std::tuple<bool, orc::Literal> convert_to_orc_literal(const orc::Type* type, const void* value,
int precision, int scale) {
std::tuple<bool, orc::Literal> convert_to_orc_literal(const orc::Type* type,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: function 'convert_to_orc_literal' exceeds recommended size/complexity thresholds [readability-function-size]

std::tuple<bool, orc::Literal> convert_to_orc_literal(const orc::Type* type,
                               ^
Additional context

be/src/vec/exec/format/orc/vorc_reader.cpp:469: 94 lines including whitespace and comments (threshold 80)

std::tuple<bool, orc::Literal> convert_to_orc_literal(const orc::Type* type,
                               ^

// check if there are rest children of expr can be pushed down to orc reader
bool OrcReader::_check_rest_children_can_push_down(const VExprSPtr& expr) {
if (expr->children().size() < 2) {
return false;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: redundant boolean literal in conditional return statement [readability-simplify-boolean-expr]

be/src/vec/exec/format/orc/vorc_reader.cpp:639:

-     if (expr->children().size() < 2) {
-         return false;
-     }
- 
-     for (size_t i = 1; i < expr->children().size(); ++i) {
-         if (!expr->children()[i]->is_literal()) {
-             return false;
-         }
-     }
-     return true;
+     return !expr->children().size() < 2;

return true;
}

bool OrcReader::_build_search_argument(const VExprSPtr& expr,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: function '_build_search_argument' exceeds recommended size/complexity thresholds [readability-function-size]

bool OrcReader::_build_search_argument(const VExprSPtr& expr,
                ^
Additional context

be/src/vec/exec/format/orc/vorc_reader.cpp:772: 115 lines including whitespace and comments (threshold 80)

bool OrcReader::_build_search_argument(const VExprSPtr& expr,
                ^

@@ -18,9 +18,9 @@
#pragma once

#include <cctz/time_zone.h>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: 'cctz/time_zone.h' file not found [clang-diagnostic-error]

#include <cctz/time_zone.h>
         ^

@@ -17,20 +17,9 @@

#include "testutil/desc_tbl_builder.h"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: 'testutil/desc_tbl_builder.h' file not found [clang-diagnostic-error]

#include "testutil/desc_tbl_builder.h"
         ^

@@ -20,15 +20,16 @@

#include <gen_cpp/Descriptors_types.h>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: 'gen_cpp/Descriptors_types.h' file not found [clang-diagnostic-error]

#include <gen_cpp/Descriptors_types.h>
         ^

// specific language governing permissions and limitations
// under the License.

#include <glog/logging.h>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: 'glog/logging.h' file not found [clang-diagnostic-error]

#include <glog/logging.h>
         ^

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.72% (9593/26123)
Line Coverage: 28.13% (78933/280589)
Region Coverage: 26.80% (40544/151280)
Branch Coverage: 23.63% (20633/87322)
Coverage Report: http://coverage.selectdb-in.cc/coverage/32d7bd085c6ddfc2159df972edaff03c82dc1cad_32d7bd085c6ddfc2159df972edaff03c82dc1cad/report/index.html

@yiguolei yiguolei merged commit dceaf97 into branch-2.1 Nov 22, 2024
20 of 22 checks passed
@github-actions github-actions bot deleted the auto-pick-43255-branch-2.1 branch November 22, 2024 14:53
morningman added a commit to morningman/doris that referenced this pull request Dec 6, 2024
yiguolei pushed a commit that referenced this pull request Dec 6, 2024
Revert "branch-2.1: [enhance](orc) Optimize ORC Predicate Pushdown for
OR-connected Predicate #43255 (#44438)"
Revert "[fix](orc) check all the cases before build_search_argument
(#44615) (#44801)"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants