Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Repository] Perform benchmarking for multistream downloads #10108

Closed
Tracked by #8596
kotwanikunal opened this issue Sep 18, 2023 · 1 comment
Closed
Tracked by #8596

[Repository] Perform benchmarking for multistream downloads #10108

kotwanikunal opened this issue Sep 18, 2023 · 1 comment
Assignees
Labels
distributed framework enhancement Enhancement or improvement to existing feature or request

Comments

@kotwanikunal
Copy link
Member

Is your feature request related to a problem? Please describe.

Describe the solution you'd like
Run benchmarks using OpenSearch/opensearch-benchmark for the newly added APIs

Describe alternatives you've considered

  • N/A

Additional context

@kotwanikunal kotwanikunal added enhancement Enhancement or improvement to existing feature or request untriaged labels Sep 18, 2023
@Rishikesh1159 Rishikesh1159 self-assigned this Sep 18, 2023
@Rishikesh1159
Copy link
Member

Rishikesh1159 commented Oct 2, 2023

I have added few logs on the recovery path to get some benchmarks for multistream downloads.

Test Summary :

  • Dataset : Stack Overflow (34.1gb)
  • Metrics : Recovery time (Ingested cluster with stackoverflow dataset and then added a replica to the index. The new replica would download all the data from remote store. The time taken here to download all data from remote store is recovery/restore time for replica shard)
  • Results:

With Multistream Download:

From logs time taken to download is : 4.599 min

[so][0] starting time for download segments : 11618566048144637
[2023-09-30T00:05:12,414][INFO ][o.o.i.s.IndexShard    ] [node-3] [so][0] End time for download segments : 11618841991960896
[2023-09-30T00:05:12,420][INFO ][o.o.i.s.IndexShard    ] [node-3] [so][0] time taken to download segments : 275943816259

From recovery api:

$ curl localhost:9200/_cat/recovery?v
index shard time type    stage source_host  source_node target_host  target_node repository snapshot files files_recovered files_percent files_total bytes bytes_recovered bytes_percent bytes_total translog_ops translog_ops_recovered translog_ops_percent
so  0   143ms empty_store done n/a      n/a     node-ip node-1   n/a    n/a   0   0        0.0%     0      0   0        0.0%     0      0      0           100.0%
so  0   4.7m peer    done node-ip node-1   node-ip node-3   n/a    n/a   1   1        100.0%    169     3776 3776      100.0%    36711602919 0      0           100.0%

Without MultiStream Download (default case):

From logs time taken to download is : 14.867 min.

starting time for download segments : 11674473747749612
[2023-09-30T15:47:16,197][INFO ][o.o.i.s.IndexShard    ] [node-1] [so][0] End time for download segments : 11675365792826759
[2023-09-30T15:47:16,198][INFO ][o.o.i.s.IndexShard    ] [node-1] [so][0] time taken to download segments : 892045077147

From recovery api :

$ curl localhost:9200/_cat/recovery?v
index shard time type    stage source_host source_node target_host  target_node repository snapshot files files_recovered files_percent files_total bytes bytes_recovered bytes_percent bytes_total translog_ops translog_ops_recovered translog_ops_percent
so  0   181ms empty_store done n/a     n/a     node-ip  node-2   n/a    n/a   0   0        0.0%     0      0   0        0.0%     0      0      0           100.0%
so  0   14.8m peer    done node-ip node-2   node-ip node-1   n/a    n/a   1   1        100.0%    154     3356 3356      100.0%    36662256654 0      0           100.0%

From the results above we can see that: MultiStream Download is almost 3x+ faster than Non-MultiStream (Default case)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
distributed framework enhancement Enhancement or improvement to existing feature or request
Projects
None yet
Development

No branches or pull requests

2 participants