Add expand data corpus instructions #8807

Naarcha-AWS · 2024-11-25T19:10:46Z

Fixes opensearch-project/opensearch-benchmark#672

Checklist

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and subject to the Developers Certificate of Origin.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Archer <[email protected]>

github-actions · 2024-11-25T19:10:57Z

Thank you for submitting your PR. The PR states are In progress (or Draft) -> Tech review -> Doc review -> Editorial review -> Merged.

Before you submit your PR for doc review, make sure the content is technically accurate. If you need help finding a tech reviewer, tag a maintainer.

When you're ready for doc review, tag the assignee of this PR. The doc reviewer may push edits to the PR directly or leave comments and editorial suggestions for you to address (let us know in a comment if you have a preference). The doc reviewer will arrange for an editorial review.

IanHoang · 2024-11-25T19:18:53Z

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md

+
+# Expanding the data corpus of a workload
+
+This tutorial shows you how to use the [`expand-data-corpus.py`](https://github.com/opensearch-project/opensearch-benchmark/blob/main/scripts/expand-data-corpus.py) script to increase the size of the data corpus for a OpenSearch Becnhmark workload. This can help assist in running the `https_logs` Benchmark with a larger scale, for instance, with clusters containing multiple data nodes.


Nit: We can simplify the last sentence:

This is helpful when running time-series workloads like http_logs against a large scale OpenSearch cluster.

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md

Signed-off-by: Naarcha-AWS <[email protected]>

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md

Signed-off-by: Naarcha-AWS <[email protected]>

IanHoang

Would recommend getting feedback from @gkamat as he has more experience with this and might have additional comments

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md

Signed-off-by: Naarcha-AWS <[email protected]>

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md

Signed-off-by: Naarcha-AWS <[email protected]>

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md

Signed-off-by: Naarcha-AWS <[email protected]>

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md

Signed-off-by: Naarcha-AWS <[email protected]>

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md

gkamat · 2024-12-10T02:53:02Z

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md

+To use this tutorial, make sure you fulfill the following prerequisites:
+
+1. Python 3.x or greater installed.
+2. The `http_logs` workload data corpus already in your load generation host where benchmark is running.


corpus is already available in your load generation host where OSB is running.

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md

Signed-off-by: Naarcha-AWS <[email protected]>

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md

Signed-off-by: Naarcha-AWS <[email protected]>

natebower

@Naarcha-AWS Please see my comments and changes and tag me for approval once addressed. Thanks!

natebower · 2024-12-13T10:07:36Z

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md

@@ -0,0 +1,83 @@
+---
+layout: default
+title: Expand data corpus


Should this be "Expanding a data corpus"?

natebower · 2024-12-13T10:07:56Z

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md

+grand_parent: User guide
+---
+
+# Expanding the data corpus of a workload


"Expanding a workload data corpus"?

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md

natebower · 2024-12-13T10:10:17Z

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md

+To use this tutorial, make sure you fulfill the following prerequisites:
+
+1. Python 3.x or greater installed.
+2. The `http_logs` workload data corpus is already in your load generation host where OpenSearch Benchmark is running.


Something like "is already stored on the load generation host running OpenSearch Benchmark"?

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md

Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Naarcha-AWS <[email protected]>

Signed-off-by: Nathan Bower <[email protected]>

natebower

@Naarcha-AWS LGTM!

Add expand data corpus instructions

b3fb0ed

Signed-off-by: Archer <[email protected]>

Naarcha-AWS added 3 - Tech review PR: Tech review in progress benchmark backport 2.18 PR: Backport label for 2.18 labels Nov 25, 2024

Naarcha-AWS self-assigned this Nov 25, 2024

Naarcha-AWS requested review from kolchfa-aws, vagimeli, AMoo-Miki, natebower, dlvenable and epugh as code owners November 25, 2024 19:10

IanHoang reviewed Nov 25, 2024

View reviewed changes

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md Outdated Show resolved Hide resolved

Naarcha-AWS commented Nov 25, 2024

View reviewed changes

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md Outdated Show resolved Hide resolved

Naarcha-AWS commented Nov 25, 2024

View reviewed changes

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md Outdated Show resolved Hide resolved

Naarcha-AWS commented Nov 25, 2024

View reviewed changes

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md Outdated Show resolved Hide resolved

Apply suggestions from code review

e575773

Signed-off-by: Naarcha-AWS <[email protected]>

Naarcha-AWS commented Nov 25, 2024

View reviewed changes

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md Outdated Show resolved Hide resolved

Apply suggestions from code review

56a3c75

Signed-off-by: Naarcha-AWS <[email protected]>

IanHoang suggested changes Nov 25, 2024

View reviewed changes

Merge branch 'main' into expand-data-corpus

729185a

gkamat suggested changes Nov 26, 2024

View reviewed changes

Naarcha-AWS commented Dec 3, 2024

View reviewed changes