Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add remote state publication #7364

Merged
merged 10 commits into from
Jun 14, 2024
Original file line number Diff line number Diff line change
Expand Up @@ -54,12 +54,43 @@ In addition to the mandatory static settings, you can configure the following dy

Setting | Default | Description
:--- | :--- | :---
`cluster.remote_store.state.index_metadata.upload_timeout` | 20s | The amount of time to wait for index metadata upload to complete. Note that index metadata for separate indexes is uploaded in parallel.
`cluster.remote_store.state.global_metadata.upload_timeout` | 20s | The amount of time to wait for global metadata upload to complete. Global metadata contains globally applicable metadata, such as templates, cluster settings, data stream metadata, and repository metadata.
`cluster.remote_store.state.metadata_manifest.upload_timeout` | 20s | The amount of time to wait for the manifest file upload to complete. The manifest file contains the details of each of the files uploaded for a single cluster state, both index metadata files and global metadata files.
`cluster.remote_store.state.index_metadata.upload_timeout` | 20s | Deprecated. Use `cluster.remote_store.state.global_metadata.upload_timeout` instead.
`cluster.remote_store.state.global_metadata.upload_timeout` | 20s | The amount of time to wait for the cluster state upload to complete.
`cluster.remote_store.state.metadata_manifest.upload_timeout` | 20s | The amount of time to wait for the manifest file upload to complete. The manifest file contains the details of each of the files uploaded for a single cluster state, both index metadata files and global metadata files.
`cluster.remote_store.state.cleanup_interval` | 300s | The interval for the remote state clean-up asynchronous task to run. This task deletes any old remote state files.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved


## Limitations

The remote cluster state functionality has the following limitations:
- Unsafe bootstrap scripts cannot be run when the remote cluster state is enabled. When a majority of cluster-manager nodes are lost and the cluster goes down, the user needs to replace any remaining cluster manager nodes and reseed the nodes in order to bootstrap a new cluster.

## Remote cluster state publication
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved


The cluster manager node processes updates to the cluster state. It then publishes the updated cluster state over the local transport layer to all of the follower nodes. With the `remote_store.publication` feature enabled, the cluster state is backed up to the remote store with every state update. The follower nodes can then fetch the state from the remote store directly which reduces the overhead on the cluster manager node for publication.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

To enable the feature flag for the `remote_store.publication` feature follow the steps in [experimental feature flag documentation]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/experimental/).
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

Enabling the setting does not change the publication flow and follower nodes will not send acknowledgements back to the cluster manager
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should "node" follow "cluster manager"?

until they download the updated cluster state from the remote store.

Enabling the remote cluster state feature is mandatory for remote publication to work. The routing tables repository settings contains the shard allocation details for each index in the cluster state. You can configure the remote table repository by using following settings:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Enabling the remote cluster state feature is mandatory for remote publication to work. The routing tables repository settings contains the shard allocation details for each index in the cluster state. You can configure the remote table repository by using following settings:
You must enable the remote cluster state feature in order for remote publication to work. The routing tables repository settings contain the shard allocation details for each index in the cluster state. You can configure the remote table repository by using following settings:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second and third sentences: Is "routing tables" the name of a repository? If so, should it be hyphenated and in code font? Does "remote table repository" refer to the same repository as in the preceding sentence? If so, let's refer to it in the same way.

Copy link
Collaborator

@Naarcha-AWS Naarcha-AWS Jun 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Routing tables isn't the name of a repository. Rather, its a set of rules used that determines where certain data packets are sent within the network. In this case, the routing table settings set the shard allocations for the remote cluster. However, in order for a remote cluster to work, there needs to be a repository, such as Blob Storage or S3.

I offered an alternative suggestion below @natebower.

Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

```yml
# Remote routing table repository settings
node.attr.remote_store.routing_table.repository: my-remote-routing-table-repo
node.attr.remote_store.repository.my-remote-routing-table-repo.type: s3
node.attr.remote_store.repository.my-remote-routing-table-repo.settings.bucket: <Bucket Name 3>
node.attr.remote_store.repository.my-remote-routing-table-repo.settings.region: <Bucket region>
```

You do not have to use different remote store repositories for state and routing, since both state and routing can use the same repository settings.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

To configure remote publication, use the following cluster settings:
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

Setting | Default | Description
:--- | :--- | :---
`cluster.remote_store.state.read_timeout` | 20s | The amount of time to wait for remote state download to complete on the follower node.
`cluster.remote_store.routing_table.path_type` | HASHED_PREFIX | Path type to be used for creating index routing path in blob store. Valid values are "FIXED", "HASHED_PREFIX", "HASHED_INFIX"
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
`cluster.remote_store.routing_table.path_hash_algo` | FNV_1A_BASE64 | Algorithm to be used for constructing prefix or infix of blob store path. This setting comes into effect into if cluster.remote_store.routing_table.path_type is "hashed_prefix" or "hashed_infix". Valid values of algorithm are "FNV_1A_BASE64" or "FNV_1A_COMPOSITE_1"
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
Loading