-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhancement 1895: Fully parallelise processing in read_batch #1950
Merged
alexowens90
merged 4 commits into
master
from
enhancement/1895/fully-parallelise-read-batch-with-processing
Nov 7, 2024
Merged
Enhancement 1895: Fully parallelise processing in read_batch #1950
alexowens90
merged 4 commits into
master
from
enhancement/1895/fully-parallelise-read-batch-with-processing
Nov 7, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
alexowens90
force-pushed
the
enhancement/1895/fully-parallelise-read-batch-with-processing
branch
from
October 31, 2024 15:36
7cf53f3
to
ec834d6
Compare
alexowens90
changed the title
WIP Enhancement 1895: Fully parallelise processing in read_batch
Enhancement 1895: Fully parallelise processing in read_batch
Oct 31, 2024
willdealtry
approved these changes
Nov 6, 2024
@@ -1198,10 +1148,10 @@ void copy_segments_to_frame( | |||
handler_data, | |||
context_row->fetch_index()})); | |||
} | |||
folly::collect(copy_tasks).get(); | |||
return folly::collect(copy_tasks).via(&async::cpu_executor()).unit(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unit() method was a new one to me
auto it = std::next(clauses.cbegin()); | ||
while (it != clauses.cend()) { | ||
auto prev_it = std::prev(it); | ||
if ((*prev_it)->clause_info().output_structure_ == (*it)->clause_info().input_structure_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very cool.
alexowens90
force-pushed
the
enhancement/1895/fully-parallelise-read-batch-with-processing
branch
from
November 6, 2024 12:36
e803b94
to
ded29e3
Compare
alexowens90
deleted the
enhancement/1895/fully-parallelise-read-batch-with-processing
branch
November 7, 2024 09:33
alexowens90
added a commit
that referenced
this pull request
Nov 7, 2024
…ent symbol (#1991) #### Reference Issues/PRs Closes man-group/arcticdb-ursus#8 #### What does this implement or fix? Actual bug was fixed, probably in #1950, this just adds a test of the correct behaviour
grusev
pushed a commit
that referenced
this pull request
Nov 25, 2024
#### Reference Issues/PRs Closes #1895 Fixes man-group/arcticdb-man#171 Fixes #1936 Fixes #1939 Fixes #1940 #### What does this implement or fix? Schedules all work asynchronously in batch reads when processing is involved, as well as when all symbols are being read directly. Previously, symbols were processed sequentially, leading to idle CPUs when processing lots of smaller symbols. This works by making `read_frame_for_version` schedule work and return futures, rather than actually performing the processing. This implementation can then be used for all 4 combinations of batch/non-batch and direct/with processing reads, significantly simplifying the code and removing the now redundant `async_read_direct` (the fact that there were two different implementations to achieve effectively the same thing is what led to 2 of the bugs in the first place). Several bugs that were discovered during the implementation (flagged above) have also been fixed. Further work in this area covered in #1968
grusev
pushed a commit
that referenced
this pull request
Nov 25, 2024
…ent symbol (#1991) #### Reference Issues/PRs Closes man-group/arcticdb-ursus#8 #### What does this implement or fix? Actual bug was fixed, probably in #1950, this just adds a test of the correct behaviour
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Reference Issues/PRs
Closes #1895
Fixes https://github.com/man-group/arcticdb-man/issues/171
Fixes #1936
Fixes #1939
Fixes #1940
What does this implement or fix?
Schedules all work asynchronously in batch reads when processing is involved, as well as when all symbols are being read directly. Previously, symbols were processed sequentially, leading to idle CPUs when processing lots of smaller symbols.
This works by making
read_frame_for_version
schedule work and return futures, rather than actually performing the processing. This implementation can then be used for all 4 combinations of batch/non-batch and direct/with processing reads, significantly simplifying the code and removing the now redundantasync_read_direct
(the fact that there were two different implementations to achieve effectively the same thing is what led to 2 of the bugs in the first place).Several bugs that were discovered during the implementation (flagged above) have also been fixed.
Further work in this area covered in #1968