From 89cf4bf9a43665dbb43bb6aad3ab8761cc94dfbc Mon Sep 17 00:00:00 2001 From: Naarcha-AWS Date: Fri, 22 Sep 2023 11:32:50 -0500 Subject: [PATCH 1/9] Add remote store main page updates. Add shallow snapshots Signed-off-by: Naarcha-AWS --- .../remote-store/index.md | 146 +++++++++--------- .../remote-store/snapshot-interoperability.md | 36 +++++ 2 files changed, 108 insertions(+), 74 deletions(-) create mode 100644 _tuning-your-cluster/availability-and-recovery/remote-store/snapshot-interoperability.md diff --git a/_tuning-your-cluster/availability-and-recovery/remote-store/index.md b/_tuning-your-cluster/availability-and-recovery/remote-store/index.md index 6066dec238..a3e66a6bf1 100644 --- a/_tuning-your-cluster/availability-and-recovery/remote-store/index.md +++ b/_tuning-your-cluster/availability-and-recovery/remote-store/index.md @@ -17,17 +17,72 @@ Introduced 2.10 Remote-backed storage offers OpenSearch users a new way to protect against data loss by automatically creating backups of all index transactions and sending them to remote storage. In order to expose this feature, segment replication must also be enabled. See [Segment replication]({{site.url}}{{site.baseurl}}/opensearch/segment-replication/) for additional information. +With remote-backed storage, when a write request lands on the primary shard, the request is indexed to Lucene on the primary shard only. The corresponding translog is then uploaded to remote store. OpenSearch does not send the write request to the replicas, but rather performs a primary term validation to confirm that the request originator shard is still the primary shard. Primary term validation ensures that the acting primary shard fails if it becomes isolated and is unaware of the cluster manager electing a new primary. + +After segments are created on the primary shard as part of the refresh, flush, and merge flow, segments are uploaded to remote segment store and the replica shards source the copy from the same store. This frees up the primary shard from data copying operation. + +## Configuration + +Remote-backed storage is a cluster level setting. It can only be enabled when bootstrapping to the cluster. After bootstrapping completes, the remote-backed storage cannot be enabled or disabled. This provides durability at the cluster level. + +Communication to the configured remote cluster happens inside the repository plugin interface. All the existing implementations of the Repository plugin, such as Azure Blob Storage, Google Cloud Store, and AWS S3, are compatible with remote-backed storage. + +Make sure remote store settings are configured the same across all nodes in the cluster. If not, bootstrapping will fail for nodes with different attributes from the elected cluster manager node. +{: .note} + +To enable remote-backed storage for a given cluster, provide the remote store repository details as node attributes in `opensearch.yml`, as shown in the following example: + +```yml +# Repository name +node.attr.remote_store.segment.repository: my-repo-1 +node.attr.remote_store.translog.repository: my-repo-2 +node.attr.remote_store.state.repository: my-repo-3 + +# Segment repository settings +node.attr.remote_store.repository.my-repo-1.type: s3 +node.attr.remote_store.repository.my-repo-1.settings.bucket: +node.attr.remote_store.repository.my-repo-1.settings.base_path: +node.attr.remote_store.repository.my-repo-1.settings.region: us-east-1 + +# Translog repository settings +node.attr.remote_store.repository.my-repo-2.type: s3 +node.attr.remote_store.repository.my-repo-2.settings.bucket: +node.attr.remote_store.repository.my-repo-2.settings.base_path: +node.attr.remote_store.repository.my-repo-2.settings.region: us-east-1 + +# Cluster state repository settings +node.attr.remote_store.repository.my-repo-3.type: s3 +node.attr.remote_store.repository.my-repo-3.settings.bucket: +node.attr.remote_store.repository.my-repo-3.settings.base_path: +node.attr.remote_store.repository.my-repo-3.settings.region: us-east-1 +``` +{% include copy-curl.html %} + +You do not have the use three different remote store repositories for segment, translog, and state. All three stores can share the same repository. + +After the cluster is created with the `remote_store` settings, all indexes created in that cluster will start uploading data to the configured remote store. + +## Related cluster settings + +You can use the following [cluster settings]({{site.url}}{{site.baseurl}}//api-reference/cluster-api/cluster-settings/) to tune how remote-backed clusters handle each workload. + +| Field | Data type | Description | +| :--- | :--- | :--- | +| cluster.default.index.refresh_interval | Time unit | Sets the refresh interval when the `index.refresh_interval` setting is not provided. This setting can be useful when you want to set a default refresh interval across all indexes in a cluster and also support the `searchIdle` setting. You cannot set the interval lower than the `cluster.minimum.index.refresh_interval` setting. | +| cluster.minimum.index.refresh_interval | Time unit | Sets the minimum refresh interval and applies it to all indexes in the cluster. The `cluster.default.index.refresh_interval` setting should be higher than this setting's value. If, during index creation, the `index.refresh_interval` setting is lower than the minimum set, index creation fails. | +| cluster.remote_store.translog.buffer_interval | Time unit | The default value of the translog buffer interval used when performing periodic translog updates. This setting is only effective when the index setting `index.remote_store.translog.buffer_interval` is not present. | + + + ## Translog Any index changes, such as indexing or deleting documents, are written to disk during a Lucene commit. However, Lucene commits are expensive operations, so they cannot be performed after every change to the index. Instead, each shard records every indexing operation in a transaction log called *translog*. When a document is indexed, it is added to the memory buffer and recorded in the translog. Frequent refresh operations write the documents in the memory buffer to a segment and then clear the memory buffer. Periodically, a flush performs a Lucene commit, which includes writing the segments to disk using fsync, purging the old translog, and starting a new translog. Thus, a translog contains all operations that have not yet been flushed. -## Segment replication and remote-backed storage - -When neither segment replication nor remote-backed storage is enabled, OpenSearch uses document replication. In document replication, when a write request lands on the primary shard, the request is indexed to Lucene and stored in the translog. After this, the request is sent to the replicas, where, in turn, it is indexed to Lucene and stored in the translog for durability. +when a write request lands on the primary shard, the request is indexed to Lucene and stored in the translog. After this, the request is sent to the replicas, where, in turn, it is indexed to Lucene and stored in the translog for durability. With segment replication, segments are created on the primary shard only and then copied to all replicas. The replicas do not index requests to Lucene, but they do create and maintain a translog. -With remote-backed storage, when a write request lands on the primary shard, the request is indexed to Lucene on the primary shard only. The corresponding translog is then uploaded to remote store. OpenSearch does not send the write request to the replicas, but rather performs a primary term validation to confirm that the request originator shard is still the primary shard. Primary term validation ensures that the acting primary shard fails if it becomes isolated and is unaware of the cluster manager electing a new primary. + ## The `index.translog.durability` translog setting @@ -66,74 +121,7 @@ The remote store feature supports two levels of durability: - Request-level durability: Translogs are uploaded before acknowledging the request. Set the `translog` flag to `true` to achieve request-level durability. In this scenario, we recommend to batch as many requests as possible in a bulk request. Batching requests will improve indexing throughput and latency compared to sending individual write requests. -## Enable the feature flag - -There are several methods for enabling remote store feature, depending on the install type. You will also need to enable `remote_store` property when creating the index. - -Segment replication must also be enabled to use remote-backed storage. -{: .note} - -### Enable on a node using a tarball install - -The flag is toggled using a new jvm parameter that is set either in `OPENSEARCH_JAVA_OPTS` or in config/jvm.options. - -#### Option 1: Modify jvm.options - -Add the following lines to `config/jvm.options` before starting the OpenSearch process to enable the feature and its dependency: - -``` --Dopensearch.experimental.feature.replication_type.enabled=true --Dopensearch.experimental.feature.remote_store.enabled=true -``` - -Run OpenSearch - -```bash -./bin/opensearch -``` - -#### Option 2: Enable from an environment variable - -As an alternative to directly modifying `config/jvm.options`, you can define the properties by using an environment variable. This can be done in a single command when you start OpenSearch or by defining the variable with `export`. - -To add these flags in-line when starting OpenSearch: - -```bash -OPENSEARCH_JAVA_OPTS="-Dopensearch.experimental.feature.replication_type.enabled=true -Dopensearch.experimental.feature.remote_store.enabled=true" ./opensearch-{{site.opensearch_version}}/bin/opensearch -``` - -If you want to define the environment variable separately, prior to running OpenSearch: - -```bash -export OPENSEARCH_JAVA_OPTS="-Dopensearch.experimental.feature.replication_type.enabled=true -Dopensearch.experimental.feature.remote_store.enabled=true" -./bin/opensearch -``` - -### Enable with Docker containers -If you're running Docker, add the following line to docker-compose.yml underneath the `opensearch-node` and `environment` section: - -````json -OPENSEARCH_JAVA_OPTS="-Dopensearch.experimental.feature.replication_type.enabled=true -Dopensearch.experimental.feature.remote_store.enabled=true" -```` - -### Enable for OpenSearch development - -To create new indexes with remote-backed storage enabled, you must first enable these features by adding the correct properties to `run.gradle` before building OpenSearch. See the [developer guide](https://github.com/opensearch-project/OpenSearch/blob/main/DEVELOPER_GUIDE.md) for information about to use how Gradle to build OpenSearch. - -Add the following properties to `run.gradle` to enable the feature: - -```bash -testClusters { - runTask { - testDistribution = 'archive' - if (numZones > 1) numberOfZones = numZones - if (numNodes > 1) numberOfNodes = numNodes - systemProperty 'opensearch.experimental.feature.replication_type.enabled', 'true' - systemProperty 'opensearch.experimental.feature.remote_store.enabled', 'true' - } -} -``` ## Register a remote repository @@ -203,13 +191,19 @@ Setting `translog.enabled` to `true` is currently an irreversible operation. ### Restoring from a backup -To restore an index from a remote backup, such as in the event of a node failure, you must first close the index: +To restore an index from a remote backup, such as in the event of a node failure, use one of the following options: + +**Restore only unassigned shards** ```bash -curl -X POST "https://localhost:9200/my-index/_close" -ku admin:admin +curl -X POST "https://localhost:9200/_remotestore/_restore" -H 'Content-Type: application/json' -d' +{ + "indices": ["my-index-1", "my-index-2"] +} +' ``` -Restore the index from the backup stored on the remote repository: +**Remote all shards of a given index** ```bash curl -X POST "https://localhost:9200/_remotestore/_restore" -ku admin:admin -H 'Content-Type: application/json' -d' @@ -235,3 +229,7 @@ The following are known limitations of the remote-backed storage feature: - Writing data to a remote store can be a high-latency operation when compared to writing data on the local file system. This may impact the indexing throughput and latency. For performance benchmarking results, see [issue #6376](https://github.com/opensearch-project/OpenSearch/issues/6376). +## Next steps + +To track future enhancements to remote-backed storage, see [Issue #10181](https://github.com/opensearch-project/OpenSearch/issues/10181). + diff --git a/_tuning-your-cluster/availability-and-recovery/remote-store/snapshot-interoperability.md b/_tuning-your-cluster/availability-and-recovery/remote-store/snapshot-interoperability.md new file mode 100644 index 0000000000..9b661d0eb7 --- /dev/null +++ b/_tuning-your-cluster/availability-and-recovery/remote-store/snapshot-interoperability.md @@ -0,0 +1,36 @@ +--- +layout: default +title: Shallow snapshots +nav_order: 15 +parent: Remote-backed storage +grand_parent: Availability and recovery +--- + +# Shallow snapshots + +Shallow copy snapshots allow you to reference data from an entire remote-backed segment instead of storing all of the data from the segment in a snapshot. This makes accessing segment data faster than normal snapshots, because segment data is not stored in the snapshot repository. + +## Enabling shallowing snapshots + +Use the [Cluster Settings API]({{site.url}}{{site.baseurl}}/api-reference/cluster-api/cluster-settings/) to enable the `remote_store_index_shallow_copy` repository setting, as shown in the following example: + +```bash +PUT _cluster/settings +{ + "persistent":{ + "remote_store_index_shallow_copy": true + } +} +``` +{% include copy-curl.html %} + +Once enabled, all requests using the [Snapshot API]({{site.url}}{{site.baseurl}}/api-reference/snapshots/index/) will remain the same for all snapshots. After the setting is enabled, we recommend not disabling the setting. Doing so could affect data durability. + +## Considerations + +Consider the following before using shallow copy snapshots: + +- Shallow copy snapshots only work for remote-backed indexes. +- All nodes in the cluster must use OpenSearch 2.10 or later to take advantage of shallow copy snapshots. +- The file size difference of shallow copy snapshot shards is zero, because no segment data is stored in the snapshot itself. +- Searchable snapshot are not supported inside shallow copy snapshots. From a3841d2dc9d08be5ef605f8943b0c24b85d33346 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS Date: Fri, 22 Sep 2023 11:37:04 -0500 Subject: [PATCH 2/9] Add next steps section Signed-off-by: Naarcha-AWS --- .../availability-and-recovery/remote-store/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_tuning-your-cluster/availability-and-recovery/remote-store/index.md b/_tuning-your-cluster/availability-and-recovery/remote-store/index.md index a3e66a6bf1..83772838e0 100644 --- a/_tuning-your-cluster/availability-and-recovery/remote-store/index.md +++ b/_tuning-your-cluster/availability-and-recovery/remote-store/index.md @@ -21,7 +21,7 @@ With remote-backed storage, when a write request lands on the primary shard, the After segments are created on the primary shard as part of the refresh, flush, and merge flow, segments are uploaded to remote segment store and the replica shards source the copy from the same store. This frees up the primary shard from data copying operation. -## Configuration +## Configuring remote-backed storage Remote-backed storage is a cluster level setting. It can only be enabled when bootstrapping to the cluster. After bootstrapping completes, the remote-backed storage cannot be enabled or disabled. This provides durability at the cluster level. From 1212904f1a311579574c778ca1e1cd7465b0a0a8 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS Date: Fri, 22 Sep 2023 12:19:03 -0500 Subject: [PATCH 3/9] Remove old content. Fix link. Fix typo. Signed-off-by: Naarcha-AWS --- .../remote-store/index.md | 123 +----------------- .../remote-store/snapshot-interoperability.md | 2 +- .../segment-replication/index.md | 2 +- 3 files changed, 3 insertions(+), 124 deletions(-) diff --git a/_tuning-your-cluster/availability-and-recovery/remote-store/index.md b/_tuning-your-cluster/availability-and-recovery/remote-store/index.md index 83772838e0..0a79bbd6fd 100644 --- a/_tuning-your-cluster/availability-and-recovery/remote-store/index.md +++ b/_tuning-your-cluster/availability-and-recovery/remote-store/index.md @@ -73,123 +73,7 @@ You can use the following [cluster settings]({{site.url}}{{site.baseurl}}//api-r | cluster.remote_store.translog.buffer_interval | Time unit | The default value of the translog buffer interval used when performing periodic translog updates. This setting is only effective when the index setting `index.remote_store.translog.buffer_interval` is not present. | - -## Translog - -Any index changes, such as indexing or deleting documents, are written to disk during a Lucene commit. However, Lucene commits are expensive operations, so they cannot be performed after every change to the index. Instead, each shard records every indexing operation in a transaction log called *translog*. When a document is indexed, it is added to the memory buffer and recorded in the translog. Frequent refresh operations write the documents in the memory buffer to a segment and then clear the memory buffer. Periodically, a flush performs a Lucene commit, which includes writing the segments to disk using fsync, purging the old translog, and starting a new translog. Thus, a translog contains all operations that have not yet been flushed. - -when a write request lands on the primary shard, the request is indexed to Lucene and stored in the translog. After this, the request is sent to the replicas, where, in turn, it is indexed to Lucene and stored in the translog for durability. - -With segment replication, segments are created on the primary shard only and then copied to all replicas. The replicas do not index requests to Lucene, but they do create and maintain a translog. - - - -## The `index.translog.durability` translog setting - -Without remote-backed storage, indexing operations are only persisted to disk when the translog is fsynced. Therefore, any data that has not been written to disk can potentially be lost. - -The `index.translog.durability` setting controls how frequently OpenSearch fsyncs the translog to disk: - -- By default, `index.translog.durability` is set to `request`. This means that fsync happens after every request, and all acknowledged write requests persist in case of failure. - -- If you set `index.translog.durability` to `async`, fsync happens periodically at the specified `sync_interval` (5 seconds by default). The fsync operation is asynchronous, so acknowledge is sent without waiting for fsync. Consequently, all acknowledged writes since the last commit are lost in case of failure. - -With remote-backed storage, the translog is uploaded to a remote store for durability. - -`index.translog.durability` is a dynamic setting. To update it, use the following query: - -```json -PUT my_index/_settings -{ - "index" : { - "translog.durability" : "request" - } -} -``` - -## Refresh-level and request-level durability - -The remote store feature supports two levels of durability: - -- Refresh-level durability: Segment files are uploaded to remote store after every refresh. Set the `remote_store` flag to `true` to achieve refresh-level durability. Commit-level durability is inherent, and uploads are asynchronous. - - If you need to refresh an index manually, you can use the `_refresh` API. For example, to refresh the `my_index` index, use the following request: - - ```json - POST my_index/_refresh - ``` - -- Request-level durability: Translogs are uploaded before acknowledging the request. Set the `translog` flag to `true` to achieve request-level durability. In this scenario, we recommend to batch as many requests as possible in a bulk request. Batching requests will improve indexing throughput and latency compared to sending individual write requests. - - - -## Register a remote repository - -Now that your deployment is running with the feature flags enabled, the next step is to register a remote repository where backups will be stored. See [Register repository]({{site.url}}{{site.baseurl}}/opensearch/snapshots/snapshot-restore#register-repository) for more information. - -## Create an index - -Remote-backed storage is enabled for an index when it is created. This feature cannot be enabled for indexes that already exist. - -For refresh-level durability, include the `remote_store` property to enable the feature and specify a segment repository: - -```bash -curl -X PUT "https://localhost:9200/my-index?pretty" -ku admin:admin -H 'Content-Type: application/json' -d' -{ - "settings": { - "index": { - "number_of_shards": 1, - "number_of_replicas": 0, - "replication": { - "type": "SEGMENT" - }, - "remote_store": { - "enabled": true, - "repository": "segment-repo" - } - } - } -} -' -``` - -For request-level durability, in addition to the `remote_store` and segment repository, include the `translog` property and specify a translog repository: - -```bash -curl -X PUT "https://localhost:9200/my-index?pretty" -ku admin:admin -H 'Content-Type: application/json' -d' -{ - "settings": { - "index": { - "number_of_shards": 1, - "number_of_replicas": 1, - "replication": { - "type": "SEGMENT" - }, - "remote_store": { - "enabled": true, - "repository": "segment-repo", - "translog": { - "enabled": true, - "repository": "translog-repo", - "buffer_interval": "300ms" - } - } - } - } -} -' -``` - -You can have the same repository serve as both the segment repository and translog repository. -{: .note} - -As data is added to the index, it also will be continuously uploaded to remote storage in the form of segment and translog files because of refreshes, flushes, and translog fsyncs to disk. Along with data, other metadata files will be uploaded. -The `buffer_interval` setting specifies the time interval during which translog operations are buffered. Instead of uploading individual translog files, OpenSearch creates a single translog file with all the write operations received during the configured interval. Bundling translog files leads to higher throughput but also increases latency. The default `buffer_interval` value is 100 ms. - -Setting `translog.enabled` to `true` is currently an irreversible operation. -{: .warning} - -### Restoring from a backup +## Restoring from a backup To restore an index from a remote backup, such as in the event of a node failure, use one of the following options: @@ -223,11 +107,6 @@ You can use remote-backed storage for the following purposes: - To restore red clusters or indexes - To recover all data up to the last acknowledged write, regardless of replica count, if `index.translog.durability` is set to `request` -## Known limitations - -The following are known limitations of the remote-backed storage feature: - -- Writing data to a remote store can be a high-latency operation when compared to writing data on the local file system. This may impact the indexing throughput and latency. For performance benchmarking results, see [issue #6376](https://github.com/opensearch-project/OpenSearch/issues/6376). ## Next steps diff --git a/_tuning-your-cluster/availability-and-recovery/remote-store/snapshot-interoperability.md b/_tuning-your-cluster/availability-and-recovery/remote-store/snapshot-interoperability.md index 9b661d0eb7..e033f3a550 100644 --- a/_tuning-your-cluster/availability-and-recovery/remote-store/snapshot-interoperability.md +++ b/_tuning-your-cluster/availability-and-recovery/remote-store/snapshot-interoperability.md @@ -10,7 +10,7 @@ grand_parent: Availability and recovery Shallow copy snapshots allow you to reference data from an entire remote-backed segment instead of storing all of the data from the segment in a snapshot. This makes accessing segment data faster than normal snapshots, because segment data is not stored in the snapshot repository. -## Enabling shallowing snapshots +## Enabling shallow snapshots Use the [Cluster Settings API]({{site.url}}{{site.baseurl}}/api-reference/cluster-api/cluster-settings/) to enable the `remote_store_index_shallow_copy` repository setting, as shown in the following example: diff --git a/_tuning-your-cluster/availability-and-recovery/segment-replication/index.md b/_tuning-your-cluster/availability-and-recovery/segment-replication/index.md index 1927ffac3c..b632ee52d6 100644 --- a/_tuning-your-cluster/availability-and-recovery/segment-replication/index.md +++ b/_tuning-your-cluster/availability-and-recovery/segment-replication/index.md @@ -32,7 +32,7 @@ Segment replication can be applied in a variety of scenarios, including: As of OpenSearch 2.10, you can use two methods for segment replication: -- **Remote-backed storage**, a persistent storage solution: The primary shard sends segment files to the remote-backed storage, and the replica shards source the copy from the same store. For more information about using remote-backed storage, see [Remote-backed storage]({{site.url}}{{site.baseurl}}/tuning-your-cluster/availability-and-recovery/remote-store/index/#segment-replication-and-remote-backed-storage). +- **Remote-backed storage**, a persistent storage solution: The primary shard sends segment files to the remote-backed storage, and the replica shards source the copy from the same store. For more information about using remote-backed storage, see [Remote-backed storage]({{site.url}}{{site.baseurl}}/tuning-your-cluster/availability-and-recovery/remote-store/index/). - Node-to-node communication: The primary shard sends segment files directly to the replica shards using node-to-node communication. ## Segment replication configuration From 9d5622f0fbec334bf93fd1bd436df5bb4f7314af Mon Sep 17 00:00:00 2001 From: Naarcha-AWS Date: Fri, 22 Sep 2023 12:24:49 -0500 Subject: [PATCH 4/9] Fix link Signed-off-by: Naarcha-AWS --- .../availability-and-recovery/segment-replication/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_tuning-your-cluster/availability-and-recovery/segment-replication/index.md b/_tuning-your-cluster/availability-and-recovery/segment-replication/index.md index b632ee52d6..6c389125bd 100644 --- a/_tuning-your-cluster/availability-and-recovery/segment-replication/index.md +++ b/_tuning-your-cluster/availability-and-recovery/segment-replication/index.md @@ -55,7 +55,7 @@ PUT /my-index1 ``` {% include copy-curl.html %} -If you're using remote-backed storage, add the `remote_store` property to the index request body. For more information, see [Create an index]({{site.url}}{{site.baseurl}}/tuning-your-cluster/availability-and-recovery/remote-store/index/#create-an-index). +If you're using remote-backed storage, add the `remote_store` property to the index request body. When using node-to-node replication, the primary shard consumes more network bandwidth because it pushes segment files to all the replica shards. Thus, it's beneficial to distribute primary shards equally between the nodes. To ensure balanced primary shard distribution, set the dynamic `cluster.routing.allocation.balance.prefer_primary` setting to `true`. For more information, see [Cluster settings]({{site.url}}{{site.baseurl}}/api-reference/cluster-api/cluster-settings/). From 78244ecd0f1f5d49603d8af0225d9d7c53729115 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Fri, 22 Sep 2023 12:33:28 -0500 Subject: [PATCH 5/9] Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- .../availability-and-recovery/remote-store/index.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/_tuning-your-cluster/availability-and-recovery/remote-store/index.md b/_tuning-your-cluster/availability-and-recovery/remote-store/index.md index 0a79bbd6fd..10caad89a7 100644 --- a/_tuning-your-cluster/availability-and-recovery/remote-store/index.md +++ b/_tuning-your-cluster/availability-and-recovery/remote-store/index.md @@ -58,9 +58,9 @@ node.attr.remote_store.repository.my-repo-3.settings.region: us-east-1 ``` {% include copy-curl.html %} -You do not have the use three different remote store repositories for segment, translog, and state. All three stores can share the same repository. +You do not have to use three different remote store repositories for segment, translog, and state. All three stores can share the same repository. -After the cluster is created with the `remote_store` settings, all indexes created in that cluster will start uploading data to the configured remote store. +During the bootstrapping process, the remote-backed repositories listed in `opensearch.yml` are automatically registered. After the cluster is created with the `remote_store` settings, all indexes created in that cluster will start uploading data to the configured remote store. ## Related cluster settings @@ -90,7 +90,7 @@ curl -X POST "https://localhost:9200/_remotestore/_restore" -H 'Content-Type: ap **Remote all shards of a given index** ```bash -curl -X POST "https://localhost:9200/_remotestore/_restore" -ku admin:admin -H 'Content-Type: application/json' -d' +curl -X POST "https://localhost:9200/_remotestore/_restore?restore_all_shards=true" -ku admin:admin -H 'Content-Type: application/json' -d' { "indices": ["my-index"] } From c789bc1e3fa6b567012fbc839947f624a4ff34e4 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Fri, 22 Sep 2023 12:35:08 -0500 Subject: [PATCH 6/9] Update _tuning-your-cluster/availability-and-recovery/remote-store/index.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- .../availability-and-recovery/remote-store/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_tuning-your-cluster/availability-and-recovery/remote-store/index.md b/_tuning-your-cluster/availability-and-recovery/remote-store/index.md index 10caad89a7..cfd9f7adc9 100644 --- a/_tuning-your-cluster/availability-and-recovery/remote-store/index.md +++ b/_tuning-your-cluster/availability-and-recovery/remote-store/index.md @@ -87,7 +87,7 @@ curl -X POST "https://localhost:9200/_remotestore/_restore" -H 'Content-Type: ap ' ``` -**Remote all shards of a given index** +**Restore all shards of a given index** ```bash curl -X POST "https://localhost:9200/_remotestore/_restore?restore_all_shards=true" -ku admin:admin -H 'Content-Type: application/json' -d' From a4cc717788be7addc8c1013c0734aa7e46f3a01e Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Fri, 22 Sep 2023 13:46:22 -0500 Subject: [PATCH 7/9] Apply suggestions from code review Co-authored-by: Chris Moore <107723039+cwillum@users.noreply.github.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- .../availability-and-recovery/remote-store/index.md | 2 +- .../remote-store/snapshot-interoperability.md | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/_tuning-your-cluster/availability-and-recovery/remote-store/index.md b/_tuning-your-cluster/availability-and-recovery/remote-store/index.md index cfd9f7adc9..beecb6df8c 100644 --- a/_tuning-your-cluster/availability-and-recovery/remote-store/index.md +++ b/_tuning-your-cluster/availability-and-recovery/remote-store/index.md @@ -19,7 +19,7 @@ Remote-backed storage offers OpenSearch users a new way to protect against data With remote-backed storage, when a write request lands on the primary shard, the request is indexed to Lucene on the primary shard only. The corresponding translog is then uploaded to remote store. OpenSearch does not send the write request to the replicas, but rather performs a primary term validation to confirm that the request originator shard is still the primary shard. Primary term validation ensures that the acting primary shard fails if it becomes isolated and is unaware of the cluster manager electing a new primary. -After segments are created on the primary shard as part of the refresh, flush, and merge flow, segments are uploaded to remote segment store and the replica shards source the copy from the same store. This frees up the primary shard from data copying operation. +After segments are created on the primary shard as part of the refresh, flush, and merge flow, the segments are uploaded to remote segment store and the replica shards source a copy from the same remote segment store. This frees up the primary shard from having to perform a data copying operation. ## Configuring remote-backed storage diff --git a/_tuning-your-cluster/availability-and-recovery/remote-store/snapshot-interoperability.md b/_tuning-your-cluster/availability-and-recovery/remote-store/snapshot-interoperability.md index e033f3a550..2ccbc9967a 100644 --- a/_tuning-your-cluster/availability-and-recovery/remote-store/snapshot-interoperability.md +++ b/_tuning-your-cluster/availability-and-recovery/remote-store/snapshot-interoperability.md @@ -32,5 +32,5 @@ Consider the following before using shallow copy snapshots: - Shallow copy snapshots only work for remote-backed indexes. - All nodes in the cluster must use OpenSearch 2.10 or later to take advantage of shallow copy snapshots. -- The file size difference of shallow copy snapshot shards is zero, because no segment data is stored in the snapshot itself. -- Searchable snapshot are not supported inside shallow copy snapshots. +- There is no difference in file size between standard (regular, normal, primary or replica???) shards and shallow copy snapshot shards because no segment data is stored in the snapshot itself. +- Searchable snapshots are not supported inside shallow copy snapshots. From 9641cb406a3042e7a74dd5ea35a6e22c83b40e2c Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Fri, 22 Sep 2023 14:28:42 -0500 Subject: [PATCH 8/9] Update _tuning-your-cluster/availability-and-recovery/remote-store/index.md Co-authored-by: Chris Moore <107723039+cwillum@users.noreply.github.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- .../availability-and-recovery/remote-store/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_tuning-your-cluster/availability-and-recovery/remote-store/index.md b/_tuning-your-cluster/availability-and-recovery/remote-store/index.md index beecb6df8c..239f8696c6 100644 --- a/_tuning-your-cluster/availability-and-recovery/remote-store/index.md +++ b/_tuning-your-cluster/availability-and-recovery/remote-store/index.md @@ -27,7 +27,7 @@ Remote-backed storage is a cluster level setting. It can only be enabled when bo Communication to the configured remote cluster happens inside the repository plugin interface. All the existing implementations of the Repository plugin, such as Azure Blob Storage, Google Cloud Store, and AWS S3, are compatible with remote-backed storage. -Make sure remote store settings are configured the same across all nodes in the cluster. If not, bootstrapping will fail for nodes with different attributes from the elected cluster manager node. +Make sure remote store settings are configured the same way across all nodes in the cluster. If not, bootstrapping will fail for nodes whose attributes are different from the elected cluster manager node. {: .note} To enable remote-backed storage for a given cluster, provide the remote store repository details as node attributes in `opensearch.yml`, as shown in the following example: From afb5b284755ec8dbdebc2af453421e51bd3399d6 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Fri, 22 Sep 2023 14:34:30 -0500 Subject: [PATCH 9/9] Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- .../availability-and-recovery/remote-store/index.md | 6 +++--- .../remote-store/snapshot-interoperability.md | 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/_tuning-your-cluster/availability-and-recovery/remote-store/index.md b/_tuning-your-cluster/availability-and-recovery/remote-store/index.md index 239f8696c6..91fb22fe19 100644 --- a/_tuning-your-cluster/availability-and-recovery/remote-store/index.md +++ b/_tuning-your-cluster/availability-and-recovery/remote-store/index.md @@ -19,13 +19,13 @@ Remote-backed storage offers OpenSearch users a new way to protect against data With remote-backed storage, when a write request lands on the primary shard, the request is indexed to Lucene on the primary shard only. The corresponding translog is then uploaded to remote store. OpenSearch does not send the write request to the replicas, but rather performs a primary term validation to confirm that the request originator shard is still the primary shard. Primary term validation ensures that the acting primary shard fails if it becomes isolated and is unaware of the cluster manager electing a new primary. -After segments are created on the primary shard as part of the refresh, flush, and merge flow, the segments are uploaded to remote segment store and the replica shards source a copy from the same remote segment store. This frees up the primary shard from having to perform a data copying operation. +After segments are created on the primary shard as part of the refresh, flush, and merge flow, the segments are uploaded to remote segment store and the replica shards source a copy from the same remote segment store. This prevents the primary shard from having to perform any write operations. ## Configuring remote-backed storage Remote-backed storage is a cluster level setting. It can only be enabled when bootstrapping to the cluster. After bootstrapping completes, the remote-backed storage cannot be enabled or disabled. This provides durability at the cluster level. -Communication to the configured remote cluster happens inside the repository plugin interface. All the existing implementations of the Repository plugin, such as Azure Blob Storage, Google Cloud Store, and AWS S3, are compatible with remote-backed storage. +Communication with the configured remote cluster happens in the Repository plugin interface. All the existing implementations of the Repository plugin, such as Azure Blob Storage, Google Cloud Storage, and Amazon Simple Storage Service (Amazon S3), are compatible with remote-backed storage. Make sure remote store settings are configured the same way across all nodes in the cluster. If not, bootstrapping will fail for nodes whose attributes are different from the elected cluster manager node. {: .note} @@ -69,7 +69,7 @@ You can use the following [cluster settings]({{site.url}}{{site.baseurl}}//api-r | Field | Data type | Description | | :--- | :--- | :--- | | cluster.default.index.refresh_interval | Time unit | Sets the refresh interval when the `index.refresh_interval` setting is not provided. This setting can be useful when you want to set a default refresh interval across all indexes in a cluster and also support the `searchIdle` setting. You cannot set the interval lower than the `cluster.minimum.index.refresh_interval` setting. | -| cluster.minimum.index.refresh_interval | Time unit | Sets the minimum refresh interval and applies it to all indexes in the cluster. The `cluster.default.index.refresh_interval` setting should be higher than this setting's value. If, during index creation, the `index.refresh_interval` setting is lower than the minimum set, index creation fails. | +| cluster.minimum.index.refresh_interval | Time unit | Sets the minimum refresh interval and applies it to all indexes in the cluster. The `cluster.default.index.refresh_interval` setting should be higher than this setting's value. If, during index creation, the `index.refresh_interval` setting is lower than the minimum, index creation fails. | | cluster.remote_store.translog.buffer_interval | Time unit | The default value of the translog buffer interval used when performing periodic translog updates. This setting is only effective when the index setting `index.remote_store.translog.buffer_interval` is not present. | diff --git a/_tuning-your-cluster/availability-and-recovery/remote-store/snapshot-interoperability.md b/_tuning-your-cluster/availability-and-recovery/remote-store/snapshot-interoperability.md index 2ccbc9967a..a57aa1237c 100644 --- a/_tuning-your-cluster/availability-and-recovery/remote-store/snapshot-interoperability.md +++ b/_tuning-your-cluster/availability-and-recovery/remote-store/snapshot-interoperability.md @@ -8,7 +8,7 @@ grand_parent: Availability and recovery # Shallow snapshots -Shallow copy snapshots allow you to reference data from an entire remote-backed segment instead of storing all of the data from the segment in a snapshot. This makes accessing segment data faster than normal snapshots, because segment data is not stored in the snapshot repository. +Shallow copy snapshots allow you to reference data from an entire remote-backed segment instead of storing all of the data from the segment in a snapshot. This makes accessing segment data faster than using normal snapshots because segment data is not stored in the snapshot repository. ## Enabling shallow snapshots @@ -32,5 +32,5 @@ Consider the following before using shallow copy snapshots: - Shallow copy snapshots only work for remote-backed indexes. - All nodes in the cluster must use OpenSearch 2.10 or later to take advantage of shallow copy snapshots. -- There is no difference in file size between standard (regular, normal, primary or replica???) shards and shallow copy snapshot shards because no segment data is stored in the snapshot itself. +- There is no difference in file size between standard shards and shallow copy snapshot shards because no segment data is stored in the snapshot itself. - Searchable snapshots are not supported inside shallow copy snapshots.