Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Backport 2.x] Replace multipart download with parallel file download #10548

Merged
merged 1 commit into from
Oct 11, 2023

Conversation

opensearch-trigger-bot[bot]
Copy link
Contributor

Backport 00ccfc4 from #10519.

There are a few open issues with the multi-stream download approach:
 - Recovery stats are not being reported correctly
 - It is incompatible (short of reopening and re-reading the entire
   file) with the existing Lucene checksum validation logic
 - There are some issues with integrating it with the pending client
   side encryption work

Given this, I attempted an experiment where I replaced with
multi-stream-within-a-single-file approach with simply parallelizing
downloads across files (this is how snapshot restore works). I actually
got better results with this approach: recovering a ~52GiB shard took
about 4.7 minutes with the multi-stream code versus 3.9 minutes with the
parallel file approach (r7g.4xlarge EC2 instance, 500MiB/s EBS volume,
S3 as remote repository).

I think this is the right approach as it leverages the more
battle-tested code path and addresses the three issues listed above. The
multi-stream approach still has promise as it will allow us to download
very large files faster (whereas this approach they can be the long poll
on the transfer operation). However, given that 5GB segments (made up of
multiple files in practice) are the norm, we generally aren't dealing with
huge files.

Signed-off-by: Andrew Ross <[email protected]>
(cherry picked from commit 00ccfc4)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@github-actions
Copy link
Contributor

Compatibility status:

Checks if related components are compatible with change e13ebfa

Incompatible components

Incompatible components: [https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/performance-analyzer.git]

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/reporting.git]

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@codecov
Copy link

codecov bot commented Oct 11, 2023

Codecov Report

Merging #10548 (e13ebfa) into 2.x (d68123f) will increase coverage by 0.01%.
The diff coverage is 88.13%.

@@             Coverage Diff              @@
##                2.x   #10548      +/-   ##
============================================
+ Coverage     70.87%   70.89%   +0.01%     
+ Complexity    58540    58534       -6     
============================================
  Files          4829     4830       +1     
  Lines        276389   276386       -3     
  Branches      40579    40580       +1     
============================================
+ Hits         195899   195934      +35     
+ Misses        63860    63758     -102     
- Partials      16630    16694      +64     
Files Coverage Δ
...rc/main/java/org/opensearch/index/IndexModule.java 82.35% <ø> (ø)
...c/main/java/org/opensearch/index/IndexService.java 75.48% <100.00%> (-0.17%) ⬇️
...va/org/opensearch/index/store/RemoteDirectory.java 73.07% <ø> (+0.69%) ⬆️
...earch/index/store/RemoteSegmentStoreDirectory.java 88.84% <ø> (+0.57%) ⬆️
...ndex/store/RemoteSegmentStoreDirectoryFactory.java 96.00% <100.00%> (-0.16%) ⬇️
...in/java/org/opensearch/indices/IndicesService.java 70.58% <100.00%> (-0.39%) ⬇️
.../opensearch/indices/recovery/RecoverySettings.java 83.33% <ø> (ø)
...ices/replication/RemoteStoreReplicationSource.java 90.90% <100.00%> (+0.28%) ⬆️
server/src/main/java/org/opensearch/node/Node.java 85.71% <ø> (ø)
...ch/repositories/blobstore/BlobStoreRepository.java 59.49% <ø> (+0.53%) ⬆️
... and 3 more

... and 470 files with indirect coverage changes

@kotwanikunal kotwanikunal merged commit 422cf06 into 2.x Oct 11, 2023
42 of 67 checks passed
@github-actions github-actions bot deleted the backport/backport-10519-to-2.x branch October 11, 2023 03:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants