Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement 1895: Fully parallelise processing in read_batch #1950

Conversation

alexowens90
Copy link
Collaborator

@alexowens90 alexowens90 commented Oct 25, 2024

Reference Issues/PRs

Closes #1895
Fixes https://github.com/man-group/arcticdb-man/issues/171
Fixes #1936
Fixes #1939
Fixes #1940

What does this implement or fix?

Schedules all work asynchronously in batch reads when processing is involved, as well as when all symbols are being read directly. Previously, symbols were processed sequentially, leading to idle CPUs when processing lots of smaller symbols.

This works by making read_frame_for_version schedule work and return futures, rather than actually performing the processing. This implementation can then be used for all 4 combinations of batch/non-batch and direct/with processing reads, significantly simplifying the code and removing the now redundant async_read_direct (the fact that there were two different implementations to achieve effectively the same thing is what led to 2 of the bugs in the first place).

Several bugs that were discovered during the implementation (flagged above) have also been fixed.

Further work in this area covered in #1968

@alexowens90 alexowens90 self-assigned this Oct 25, 2024
@alexowens90 alexowens90 added the enhancement New feature or request label Oct 25, 2024
@alexowens90 alexowens90 marked this pull request as draft October 25, 2024 16:50
@alexowens90 alexowens90 force-pushed the enhancement/1895/fully-parallelise-read-batch-with-processing branch from 7cf53f3 to ec834d6 Compare October 31, 2024 15:36
@alexowens90 alexowens90 changed the title WIP Enhancement 1895: Fully parallelise processing in read_batch Enhancement 1895: Fully parallelise processing in read_batch Oct 31, 2024
@alexowens90 alexowens90 marked this pull request as ready for review October 31, 2024 15:38
@@ -1198,10 +1148,10 @@ void copy_segments_to_frame(
handler_data,
context_row->fetch_index()}));
}
folly::collect(copy_tasks).get();
return folly::collect(copy_tasks).via(&async::cpu_executor()).unit();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unit() method was a new one to me

auto it = std::next(clauses.cbegin());
while (it != clauses.cend()) {
auto prev_it = std::prev(it);
if ((*prev_it)->clause_info().output_structure_ == (*it)->clause_info().input_structure_) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very cool.

@alexowens90 alexowens90 force-pushed the enhancement/1895/fully-parallelise-read-batch-with-processing branch from e803b94 to ded29e3 Compare November 6, 2024 12:36
@alexowens90 alexowens90 merged commit c20c189 into master Nov 7, 2024
126 of 127 checks passed
@alexowens90 alexowens90 deleted the enhancement/1895/fully-parallelise-read-batch-with-processing branch November 7, 2024 09:33
alexowens90 added a commit that referenced this pull request Nov 7, 2024
…ent symbol (#1991)

#### Reference Issues/PRs
Closes man-group/arcticdb-ursus#8

#### What does this implement or fix?
Actual bug was fixed, probably in #1950, this just adds a test of the
correct behaviour
grusev pushed a commit that referenced this pull request Nov 25, 2024
#### Reference Issues/PRs
Closes #1895 
Fixes man-group/arcticdb-man#171
Fixes #1936 
Fixes #1939 
Fixes #1940 

#### What does this implement or fix?
Schedules all work asynchronously in batch reads when processing is
involved, as well as when all symbols are being read directly.
Previously, symbols were processed sequentially, leading to idle CPUs
when processing lots of smaller symbols.

This works by making `read_frame_for_version` schedule work and return
futures, rather than actually performing the processing. This
implementation can then be used for all 4 combinations of
batch/non-batch and direct/with processing reads, significantly
simplifying the code and removing the now redundant `async_read_direct`
(the fact that there were two different implementations to achieve
effectively the same thing is what led to 2 of the bugs in the first
place).

Several bugs that were discovered during the implementation (flagged
above) have also been fixed.

Further work in this area covered in #1968
grusev pushed a commit that referenced this pull request Nov 25, 2024
…ent symbol (#1991)

#### Reference Issues/PRs
Closes man-group/arcticdb-ursus#8

#### What does this implement or fix?
Actual bug was fixed, probably in #1950, this just adds a test of the
correct behaviour
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
2 participants