Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use multipart upload to parallelize index metadata uplaod #6

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
a1d8beb
Set replication type to SEGMENT in remote store enabled cluster if no…
linuxpi Aug 28, 2023
569d5c2
Bugfix: add replica information in remote store restore flow (#8951)
sachinpkale Aug 28, 2023
f4106a4
Add support to use trace propagated from client (#9506)
Gaganjuneja Aug 28, 2023
a08d588
Add Secure Bwc documentation to TESTING.md (#9414)
stephen-crawford Aug 28, 2023
e5c4f9d
For sort request on timeseries field use non concurrent search path (…
sohami Aug 28, 2023
012c4fa
[Segment Replication] Fix bug where replica shows stale doc count dur…
mch2 Aug 28, 2023
61d4d43
[Segment Replication] Add ClusterState utility to identify SEGMENT re…
dreamer-89 Aug 29, 2023
bb7d23c
Bump io.grpc:grpc-api from 1.57.1 to 1.57.2 in /plugins/repository-gc…
dependabot[bot] Aug 29, 2023
60787b8
Add SearchExtBuilders to SearchResponse (#9379)
austintlee Aug 29, 2023
81c7b97
Fix GH runners memory issue by increasing swapfile (#9596)
gaiksaya Aug 29, 2023
8324b88
[Remote Store] Retry RemoteIndexShardTests flaky tests (#9597)
dreamer-89 Aug 30, 2023
78eea27
[BWC and API enforcement] Decorate the existing APIs with proper anno…
reta Aug 30, 2023
6cd576f
Fix SegmentReplicationUsingRemoteStoreIT#testDropPrimaryDuringReplica…
mch2 Aug 30, 2023
bb38ed4
Update the minimum version check on SearchExtBuilder support in Searc…
austintlee Aug 30, 2023
0274095
Allow MockTracingTelemetry to await for asynchronous tasks terminatio…
reta Aug 30, 2023
cc007e4
Add benchmark to measure performance of CustomBinaryDocValuesField (#…
kkmr Aug 30, 2023
e563a0c
Adding concurrent search versions of query count and time metrics (#9…
jed326 Aug 30, 2023
ff65403
[Segment Replication] Handle failover in mixed cluster mode (#9536)
Poojita-Raj Aug 31, 2023
c294c91
[Remote Store] Add tracker factory to manage remote store stats track…
Aug 31, 2023
d66df10
Decouple replication lag from logic to fail stale replicas (#9507)
ankitkala Aug 31, 2023
79e5aee
[Remote State] Create service to publish cluster state to remote stor…
soosinha Aug 31, 2023
f9a661c
Add Crypto Handler abstractions for encryption/decryption and Crypto…
vikasvb90 Aug 31, 2023
082d425
Added sampler based on Blanket Probabilistic Sampling rate and Overri…
devagarwal1803 Aug 31, 2023
1126d2f
Expose DelimitedTermFrequencyTokenFilter (#9479)
russcam Aug 31, 2023
6765b16
Add async blob read and download support using multiple streams (#9592)
kotwanikunal Sep 1, 2023
04c90c7
Mute RemoteIndexShardTests primary promotion flaky tests (#9679)
dreamer-89 Sep 1, 2023
f5d3fd2
Introduce cluster default remote translog buffer interval setting (#9…
ashking94 Sep 1, 2023
2fb4694
remove redundent column headers in pit segments response (#9615)
bugmakerrrrrr Sep 1, 2023
96e851b
[Segment Replication] Allow segment replication with on disk files no…
dreamer-89 Sep 1, 2023
dbb868a
[Tracing Framework] Redefine telemetry context restoration and propag…
Gaganjuneja Sep 1, 2023
f9b6694
Fix Segment Replication stats bytes behind metric (#9686)
mch2 Sep 1, 2023
0706635
Upload all index metadata to remote
soosinha Aug 25, 2023
686290b
Add nodeId field in marker
soosinha Aug 27, 2023
73914c2
Integrate remote cluster state in publish/commit flow
soosinha Aug 25, 2023
d36b1ba
Remove persistedState from CoordinationState
soosinha Aug 28, 2023
823fab4
integrate index metadata upload with multipart upload
linuxpi Aug 28, 2023
2080dc1
fix checksum calculation for s3 upload
linuxpi Sep 2, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions .github/workflows/check-compatibility.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,15 @@ jobs:
with:
ref: ${{ github.event.pull_request.head.sha }}

- name: Increase swapfile
run: |
sudo swapoff -a
sudo fallocate -l 10G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
sudo swapon --show

- name: Run compatibility task
run: ./gradlew checkCompatibility -i | tee $HOME/gradlew-check.out

Expand Down
28 changes: 23 additions & 5 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
- Add events correlation engine plugin ([#6854](https://github.com/opensearch-project/OpenSearch/issues/6854))
- Introduce new dynamic cluster setting to control slice computation for concurrent segment search ([#9107](https://github.com/opensearch-project/OpenSearch/pull/9107))
- Implement on behalf of token passing for extensions ([#8679](https://github.com/opensearch-project/OpenSearch/pull/8679))
- Added encryption-sdk lib to provide encryption and decryption capabilities ([#8466](https://github.com/opensearch-project/OpenSearch/pull/8466))

### Dependencies
- Bump `log4j-core` from 2.18.0 to 2.19.0
Expand Down Expand Up @@ -40,6 +41,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
- Bump `org.bouncycastle:bcprov-jdk15on` to `org.bouncycastle:bcprov-jdk15to18` version 1.75 ([#8247](https://github.com/opensearch-project/OpenSearch/pull/8247))
- Bump `org.bouncycastle:bcmail-jdk15on` to `org.bouncycastle:bcmail-jdk15to18` version 1.75 ([#8247](https://github.com/opensearch-project/OpenSearch/pull/8247))
- Bump `org.bouncycastle:bcpkix-jdk15on` to `org.bouncycastle:bcpkix-jdk15to18` version 1.75 ([#8247](https://github.com/opensearch-project/OpenSearch/pull/8247))
- Add Encryption SDK dependencies ([#8466](https://github.com/opensearch-project/OpenSearch/pull/8466))

### Changed
- [CCR] Add getHistoryOperationsFromTranslog method to fetch the history snapshot from translogs ([#3948](https://github.com/opensearch-project/OpenSearch/pull/3948))
Expand Down Expand Up @@ -78,17 +80,25 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
## [Unreleased 2.x]
### Added
- Add server version as REST response header [#6583](https://github.com/opensearch-project/OpenSearch/issues/6583)
- Start replication checkpointTimers on primary before segments upload to remote store. ([#8221]()https://github.com/opensearch-project/OpenSearch/pull/8221)
- [distribution/archives] [Linux] [x64] Provide the variant of the distributions bundled with JRE ([#8195]()https://github.com/opensearch-project/OpenSearch/pull/8195)
- Start replication checkpointTimers on primary before segments upload to remote store. ([#8221](https://github.com/opensearch-project/OpenSearch/pull/8221))
- [distribution/archives] [Linux] [x64] Provide the variant of the distributions bundled with JRE ([#8195](https://github.com/opensearch-project/OpenSearch/pull/8195))
- Add configuration for file cache size to max remote data ratio to prevent oversubscription of file cache ([#8606](https://github.com/opensearch-project/OpenSearch/pull/8606))
- Disallow compression level to be set for default and best_compression index codecs ([#8737]()https://github.com/opensearch-project/OpenSearch/pull/8737)
- Disallow compression level to be set for default and best_compression index codecs ([#8737](https://github.com/opensearch-project/OpenSearch/pull/8737))
- Prioritize replica shard movement during shard relocation ([#8875](https://github.com/opensearch-project/OpenSearch/pull/8875))
- Introducing Default and Best Compression codecs as their algorithm name ([#9123]()https://github.com/opensearch-project/OpenSearch/pull/9123)
- Make SearchTemplateRequest implement IndicesRequest.Replaceable ([#9122]()https://github.com/opensearch-project/OpenSearch/pull/9122)
- Introducing Default and Best Compression codecs as their algorithm name ([#9123](https://github.com/opensearch-project/OpenSearch/pull/9123))
- Make SearchTemplateRequest implement IndicesRequest.Replaceable ([#9122](https://github.com/opensearch-project/OpenSearch/pull/9122))
- [BWC and API enforcement] Define the initial set of annotations, their meaning and relations between them ([#9223](https://github.com/opensearch-project/OpenSearch/pull/9223))
- [Segment Replication] Support realtime reads for GET requests ([#9212](https://github.com/opensearch-project/OpenSearch/pull/9212))
- [Feature] Expose term frequency in Painless script score context ([#9081](https://github.com/opensearch-project/OpenSearch/pull/9081))
- Add support for reading partial files to HDFS repository ([#9513](https://github.com/opensearch-project/OpenSearch/issues/9513))
- Add support for extensions to search responses using SearchExtBuilder ([#9379](https://github.com/opensearch-project/OpenSearch/pull/9379))
- [Remote State] Create service to publish cluster state to remote store ([#9160](https://github.com/opensearch-project/OpenSearch/pull/9160))
- [BWC and API enforcement] Decorate the existing APIs with proper annotations (part 1) ([#9520](https://github.com/opensearch-project/OpenSearch/pull/9520))
- Add concurrent segment search related metrics to node and index stats ([#9622](https://github.com/opensearch-project/OpenSearch/issues/9622))
- Decouple replication lag from logic to fail stale replicas ([#9507](https://github.com/opensearch-project/OpenSearch/pull/9507))
- Expose DelimitedTermFrequencyTokenFilter to allow providing term frequencies along with terms ([#9479](https://github.com/opensearch-project/OpenSearch/pull/9479))
- APIs for performing async blob reads and async downloads from the repository using multiple streams ([#9592](https://github.com/opensearch-project/OpenSearch/issues/9592))
- Introduce cluster default remote translog buffer interval setting ([#9584](https://github.com/opensearch-project/OpenSearch/pull/9584))

### Dependencies
- Bump `org.apache.logging.log4j:log4j-core` from 2.17.1 to 2.20.0 ([#8307](https://github.com/opensearch-project/OpenSearch/pull/8307))
Expand Down Expand Up @@ -122,6 +132,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
- Bump `actions/setup-java` from 2 to 3 ([#9457](https://github.com/opensearch-project/OpenSearch/pull/9457))
- Bump `com.google.api:gax` from 2.27.0 to 2.32.0 ([#9300](https://github.com/opensearch-project/OpenSearch/pull/9300))
- Bump `netty` from 4.1.96.Final to 4.1.97.Final ([#9553](https://github.com/opensearch-project/OpenSearch/pull/9553))
- Bump `io.grpc:grpc-api` from 1.57.1 to 1.57.2 ([#9578](https://github.com/opensearch-project/OpenSearch/pull/9578))

### Changed
- Default to mmapfs within hybridfs ([#8508](https://github.com/opensearch-project/OpenSearch/pull/8508))
Expand Down Expand Up @@ -156,8 +167,13 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
- Improve performance of encoding composite keys in multi-term aggregations ([#9412](https://github.com/opensearch-project/OpenSearch/pull/9412))
- Fix sort related ITs for concurrent search ([#9177](https://github.com/opensearch-project/OpenSearch/pull/9466)
- Removing the vec file extension from INDEX_STORE_HYBRID_NIO_EXTENSIONS, to ensure the no performance degradation for vector search via Lucene Engine.([#9528](https://github.com/opensearch-project/OpenSearch/pull/9528)))
- Add support to use trace propagated from client ([#9506](https://github.com/opensearch-project/OpenSearch/pull/9506))
- Separate request-based and settings-based concurrent segment search controls and introduce AggregatorFactory method to determine concurrent search support ([#9469](https://github.com/opensearch-project/OpenSearch/pull/9469))
- [Remote Store] Rate limiter integration for remote store uploads and downloads([#9448](https://github.com/opensearch-project/OpenSearch/pull/9448/))
- [Remote Store] Implicitly use replication type SEGMENT for remote store clusters ([#9264](https://github.com/opensearch-project/OpenSearch/pull/9264))
- Redefine telemetry context restoration and propagation ([#9617](https://github.com/opensearch-project/OpenSearch/pull/9617))
- Use non-concurrent path for sort request on timeseries index and field([#9562](https://github.com/opensearch-project/OpenSearch/pull/9562))
- Added sampler based on `Blanket Probabilistic Sampling rate` and `Override for on demand` ([#9621](https://github.com/opensearch-project/OpenSearch/issues/9621))

### Deprecated

Expand All @@ -171,6 +187,8 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
- Handle null partSize in OnDemandBlockSnapshotIndexInput ([#9291](https://github.com/opensearch-project/OpenSearch/issues/9291))
- Fix condition to remove index create block ([#9437](https://github.com/opensearch-project/OpenSearch/pull/9437))
- Add support to clear archived index setting ([#9019](https://github.com/opensearch-project/OpenSearch/pull/9019))
- [Segment Replication] Fixed bug where replica shard temporarily serves stale data during an engine reset ([#9495](https://github.com/opensearch-project/OpenSearch/pull/9495))
- [Segment Replication] Fixed bug where bytes behind metric is not accurate ([#9686](https://github.com/opensearch-project/OpenSearch/pull/9686))

### Security

Expand Down
24 changes: 24 additions & 0 deletions TESTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ OpenSearch uses [jUnit](https://junit.org/junit5/) for testing, it also uses ran
- [Iterating on packaging tests](#iterating-on-packaging-tests)
- [Testing backwards compatibility](#testing-backwards-compatibility)
- [BWC Testing against a specific remote/branch](#bwc-testing-against-a-specific-remotebranch)
- [BWC Testing with security](#bwc-testing-with-security)
- [Skip fetching latest](#skip-fetching-latest)
- [How to write good tests?](#how-to-write-good-tests)
- [Base classes for test cases](#base-classes-for-test-cases)
Expand Down Expand Up @@ -406,6 +407,29 @@ Example:

Say you need to make a change to `main` and have a BWC layer in `5.x`. You will need to: . Create a branch called `index_req_change` off your remote `${remote}`. This will contain your change. . Create a branch called `index_req_bwc_5.x` off `5.x`. This will contain your bwc layer. . Push both branches to your remote repository. . Run the tests with `./gradlew check -Dbwc.remote=${remote} -Dbwc.refspec.5.x=index_req_bwc_5.x`.

## BWC Testing with security

You may want to run BWC tests for a secure OpenSearch cluster. In order to do this, you will need to follow a few additional steps:

1. Clone the OpenSearch Security repository from https://github.com/opensearch-project/security.
2. Get both the old version of the Security plugin (the version you wish to come from) and the new version of the Security plugin (the version you wish to go to). This can be done either by fetching the maven artifact with a command like `wget https://repo1.maven.org/maven2/org/opensearch/plugin/opensearch-security/<TARGET_VERSION>.0/opensearch-security-<TARGET_VERSION>.0.zip` or by running `./gradlew assemble` from the base of the Security repository.
3. Move both of the Security artifacts into new directories at the path `/security/bwc-test/src/test/resources/<TARGET_VERSION>.0`. You should end up with two different directories in `/security/bwc-test/src/test/resources/`, one named the old version and one the new version.
4. Run the following command from the base of the Security repository:

```
./gradlew -p bwc-test clean bwcTestSuite \
-Dtests.security.manager=false \
-Dtests.opensearch.http.protocol=https \
-Dtests.opensearch.username=admin \
-Dtests.opensearch.password=admin \
-PcustomDistributionUrl="/OpenSearch/distribution/archives/linux-tar/build/distributions/opensearch-min-<TARGET_VERSION>-SNAPSHOT-linux-x64.tar.gz" \
-i
```

`-Dtests.security.manager=false` handles access issues when attempting to read the certificates from the file system.
`-Dtests.opensearch.http.protocol=https` tells the wait for cluster startup task to do the right thing.
`-PcustomDistributionUrl=...` uses a custom build of the distribution of OpenSearch. This is unnecessary when running against standard/unmodified OpenSearch core distributions.

### Skip fetching latest

For some BWC testing scenarios, you want to use the local clone of the repository without fetching latest. For these use cases, you can set the system property `tests.bwc.git_fetch_latest` to `false` and the BWC builds will skip fetching the latest from the remote.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
/*
* SPDX-License-Identifier: Apache-2.0
*
* The OpenSearch Contributors require contributions made to
* this file be licensed under the Apache-2.0 license or a
* compatible open source license.
*/

package org.opensearch.benchmark.index.mapper;

import org.apache.lucene.util.BytesRef;
import org.opensearch.index.mapper.BinaryFieldMapper;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Param;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Warmup;
import org.openjdk.jmh.infra.Blackhole;

import java.util.concurrent.ThreadLocalRandom;
import java.util.concurrent.TimeUnit;

@Warmup(iterations = 1)
@Measurement(iterations = 1)
@Fork(1)
@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@State(Scope.Thread)
@SuppressWarnings("unused") // invoked by benchmarking framework
public class CustomBinaryDocValuesFieldBenchmark {

static final String FIELD_NAME = "dummy";
static final String SEED_VALUE = "seed";

@Benchmark
public void add(CustomBinaryDocValuesFieldBenchmark.BenchmarkParameters parameters, Blackhole blackhole) {
// Don't use the parameter binary doc values object.
// Start with a fresh object every call and add maximum number of entries
BinaryFieldMapper.CustomBinaryDocValuesField customBinaryDocValuesField = new BinaryFieldMapper.CustomBinaryDocValuesField(
FIELD_NAME,
new BytesRef(SEED_VALUE).bytes
);
for (int i = 0; i < parameters.maximumNumberOfEntries; ++i) {
ThreadLocalRandom.current().nextBytes(parameters.bytes);
customBinaryDocValuesField.add(parameters.bytes);
}
}

@Benchmark
public void binaryValue(CustomBinaryDocValuesFieldBenchmark.BenchmarkParameters parameters, Blackhole blackhole) {
blackhole.consume(parameters.customBinaryDocValuesField.binaryValue());
}

@State(Scope.Benchmark)
public static class BenchmarkParameters {
@Param({ "8", "32", "128", "512" })
int maximumNumberOfEntries;

@Param({ "8", "32", "128", "512" })
int entrySize;

BinaryFieldMapper.CustomBinaryDocValuesField customBinaryDocValuesField;
byte[] bytes;

@Setup
public void setup() {
customBinaryDocValuesField = new BinaryFieldMapper.CustomBinaryDocValuesField(FIELD_NAME, new BytesRef(SEED_VALUE).bytes);
bytes = new byte[entrySize];
for (int i = 0; i < maximumNumberOfEntries; ++i) {
ThreadLocalRandom.current().nextBytes(bytes);
customBinaryDocValuesField.add(bytes);
}
}
}
}
1 change: 1 addition & 0 deletions build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -501,6 +501,7 @@ subprojects {
includeClasses.add("org.opensearch.index.reindex.DeleteByQueryBasicTests")
includeClasses.add("org.opensearch.index.reindex.UpdateByQueryBasicTests")
includeClasses.add("org.opensearch.index.shard.IndexShardIT")
includeClasses.add("org.opensearch.index.shard.RemoteIndexShardTests")
includeClasses.add("org.opensearch.index.shard.RemoteStoreRefreshListenerTests")
includeClasses.add("org.opensearch.index.translog.RemoteFSTranslogTests")
includeClasses.add("org.opensearch.indices.DateMathIndexExpressionsIntegrationIT")
Expand Down
2 changes: 1 addition & 1 deletion buildSrc/version.properties
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ httpcore = 4.4.16
httpasyncclient = 4.1.5
commonslogging = 1.2
commonscodec = 1.15

commonslang = 3.13.0
# plugin dependencies
aws = 2.20.55
reactivestreams = 1.0.4
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,13 +32,16 @@

package org.opensearch.common;

import org.opensearch.common.annotation.PublicApi;

import java.util.function.Consumer;

/**
* A {@link Consumer}-like interface which allows throwing checked exceptions.
*
* @opensearch.api
*/
@PublicApi(since = "1.0.0")
@FunctionalInterface
public interface CheckedConsumer<T, E extends Exception> {
void accept(T t) throws E;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@

package org.opensearch.common.action;

import org.opensearch.common.annotation.PublicApi;
import org.opensearch.common.unit.TimeValue;

import java.util.concurrent.Future;
Expand All @@ -42,6 +43,7 @@
*
* @opensearch.api
*/
@PublicApi(since = "1.0.0")
public interface ActionFuture<T> extends Future<T> {

/**
Expand Down
Loading