Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature](hash-table) unify initialization of HashTableVariants and support set for distinct agg #42046

Merged
merged 14 commits into from
Oct 24, 2024

Conversation

BiteTheDDDDt
Copy link
Contributor

@BiteTheDDDDt BiteTheDDDDt commented Oct 17, 2024

Proposed changes

  1. use init_hash_method to initialize agg/partition_sort/join/set's HashTableVariants
  2. support set for distinct agg
  3. support single column context of decimal256
select count(*) from (select CounterID,UserID from hits group by CounterID,UserID)t;
before:
                      58.8g   4.3g
                        DISTINCT_STREAMING_AGGREGATION_OPERATOR  (id=1  ,  nereids_id=244):(ExecTime:  3sec472ms)
                              -  BlocksProduced:  5.118K  (5118)
                              -  BuildTime:  3sec441ms
                              -  CloseTime:  1.514us
                              -  ExecTime:  3sec472ms
                              -  HashTableComputeTime:  3sec290ms
                              -  HashTableEmplaceTime:  2sec866ms
                              -  HashTableInputCount:  99.997497M  (99997497)
                              -  HashTableSize:  20.796015M  (20796015)
                              -  InitTime:  5.501us
                              -  InsertKeysToColumnTime:  142.189ms
                              -  MemoryUsage:  0.00  
                              -  MemoryUsagePeak:  0.00  
                              -  OpenTime:  7.35us
                              -  ProjectionTime:  14.908ms
                              -  RowsProduced:  20.796015M  (20796015)


after:
                      42.7g   4.1g 
                        DISTINCT_STREAMING_AGGREGATION_OPERATOR  (id=1  ,  nereids_id=244):(ExecTime:  3sec24ms)
                              -  BlocksProduced:  5.118K  (5118)
                              -  BuildTime:  2sec994ms
                              -  CloseTime:  1.333us
                              -  ExecTime:  3sec24ms
                              -  HashTableComputeTime:  2sec850ms
                              -  HashTableEmplaceTime:  2sec470ms
                              -  HashTableInputCount:  99.997497M  (99997497)
                              -  HashTableSize:  20.796015M  (20796015)
                              -  InitTime:  4.993us
                              -  InsertKeysToColumnTime:  134.734ms
                              -  MemoryUsage:  0.00  
                              -  MemoryUsagePeak:  0.00  
                              -  OpenTime:  9.822us
                              -  ProjectionTime:  14.389ms
                              -  RowsProduced:  20.796015M  (20796015)

select count() from (select murmur_hash3_32(number) from numbers("number" = "100000000") union select murmur_hash3_32(number) from numbers("number" = "100000000"))t;

peakMemoryBytes=6948914752  Time(ms)=3812
peakMemoryBytes=4035731208  Time(ms)=3128

@BiteTheDDDDt
Copy link
Contributor Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@BiteTheDDDDt
Copy link
Contributor Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@BiteTheDDDDt
Copy link
Contributor Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@BiteTheDDDDt
Copy link
Contributor Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@BiteTheDDDDt
Copy link
Contributor Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@BiteTheDDDDt
Copy link
Contributor Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@BiteTheDDDDt
Copy link
Contributor Author

run buildall

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions


#pragma once

#include <parallel_hashmap/phmap.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: 'parallel_hashmap/phmap.h' file not found [clang-diagnostic-error]

#include <parallel_hashmap/phmap.h>
         ^

@BiteTheDDDDt
Copy link
Contributor Author

run buildall

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions


#pragma once

#include <boost/noncopyable.hpp>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: 'parallel_hashmap/phmap.h' file not found [clang-diagnostic-error]

#include <parallel_hashmap/phmap.h>
         ^

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions


#pragma once

#include <boost/noncopyable.hpp>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: 'boost/noncopyable.hpp' file not found [clang-diagnostic-error]

#include <boost/noncopyable.hpp>
         ^

@BiteTheDDDDt
Copy link
Contributor Author

run buildall

@BiteTheDDDDt
Copy link
Contributor Author

run buildall

@BiteTheDDDDt BiteTheDDDDt changed the title [Chore](hash-table) unify initialization of HashTableVariants [Feature](hash-table) unify initialization of HashTableVariants and support set for distinct agg Oct 22, 2024
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions


#pragma once

#include <boost/core/noncopyable.hpp>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: 'boost/core/noncopyable.hpp' file not found [clang-diagnostic-error]

#include <boost/core/noncopyable.hpp>
         ^

@BiteTheDDDDt
Copy link
Contributor Author

run buildall

@BiteTheDDDDt
Copy link
Contributor Author

run buildall

2 similar comments
@BiteTheDDDDt
Copy link
Contributor Author

run buildall

@BiteTheDDDDt
Copy link
Contributor Author

run buildall

Copy link
Contributor

@HappenLee HappenLee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Oct 23, 2024
Copy link
Contributor

PR approved by anyone and no changes requested.

@BiteTheDDDDt BiteTheDDDDt merged commit 152cc2c into apache:master Oct 24, 2024
24 of 27 checks passed
BiteTheDDDDt added a commit that referenced this pull request Nov 4, 2024
### What problem does this PR solve?
 fix wrong result on single nullable set operators

Related PR: #42046
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants