Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ShardBatchCache to support caching for TransportNodesListGatewayStartedShardsBatch #12504

Merged
merged 37 commits into from
Apr 11, 2024

Conversation

amkhar
Copy link
Contributor

@amkhar amkhar commented Mar 1, 2024

Description

Add a new ShardBatchCache which will handle storing the responses of transport actions like TransportNodesListGatewayStartedShardsBatch or the transport action for replica shards being pushed via #8957.

A new cache class is being written as storing strategy is changed from ShardCache (written #12441).
ShardCache - stores the responses of transport actions as it is in NodeEntry class

https://github.com/opensearch-project/OpenSearch/pull/12441/files#diff-e2790d7ec0cf48617430d2352e7142c297eaea172fd6e2d34969e60ddf7f9a68R73

https://github.com/amkhar/OpenSearch/blob/3125b948029609f354d3153f8ca6391638daefc7/server/src/main/java/org/opensearch/gateway/AsyncShardFetch.java#L468-L475

    static class NodeEntry<T extends BaseNodeResponse> extends BaseShardCache.BaseNodeEntry {
        
        @Nullable
        private T value;

But the response of Transport actions like TransportNodesListGatewayStartedShardsBatch are a map

Map<ShardId, NodeGatewayStartedShard>

Storing this response as it is will create repetition for ShardId object, as the responses needs to be stored for every node.
Map<NodeId, Map<ShardId, NodeGatewayStartedShard>>

To solve the problem, storing the NodeGatewayStartedShard in an array and keeping ShardId to arrayIndex in a map

private final NodeGatewayStartedShard[] shardData;
private final Map<ShardId, Integer> shardIdKey = new HashMap<>();

Using these two data structures, repetition is avoided and array index is used for putting and getting the data from cache.
Failure handling is also different from normal ShardCache, as this is a batch of ShardIds. Based on unknown failures for a ShardId, we need to retry that shard only in next run of reroute. And allocate all other shards of the batch.

Related Issues

Resolves ##12248
This PR is dependent on
#12441
#8356

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

github-actions bot commented Mar 1, 2024

Compatibility status:

Checks if related components are compatible with change 08763ed

Incompatible components

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/flow-framework.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/performance-analyzer.git]

Copy link
Contributor

github-actions bot commented Mar 1, 2024

❌ Gradle check result for bcbc00a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Mar 5, 2024

❌ Gradle check result for 644d908: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Apr 2, 2024

❌ Gradle check result for ba6cbb4: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Apr 4, 2024

✅ Gradle check result for c0ed643: SUCCESS

Copy link
Contributor

github-actions bot commented Apr 8, 2024

❌ Gradle check result for d61477f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@amkhar
Copy link
Contributor Author

amkhar commented Apr 8, 2024

❌ Gradle check result for d61477f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Flaky test : #12197

Copy link
Contributor

github-actions bot commented Apr 9, 2024

❕ Gradle check result for 65229fc: UNSTABLE

  • TEST FAILURES:
      2 org.opensearch.cluster.coordination.AwarenessAttributeDecommissionIT.testConcurrentDecommissionAction

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link
Collaborator

@Bukhtawar Bukhtawar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @amkhar too many explicit type conversions and reflection isn't usually the best way to get around structuring classes

Copy link
Contributor

❌ Gradle check result for 08763ed: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❕ Gradle check result for 08763ed: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.http.SearchRestCancellationIT.testAutomaticCancellationDuringFetchPhase

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@amkhar
Copy link
Contributor Author

amkhar commented Apr 11, 2024

❕ Gradle check result for 08763ed: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.http.SearchRestCancellationIT.testAutomaticCancellationDuringFetchPhase

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Flaky test : #5426

@shwetathareja shwetathareja merged commit 7103e56 into opensearch-project:main Apr 11, 2024
37 of 46 checks passed
@shwetathareja shwetathareja added the backport 2.x Backport to 2.x branch label Apr 11, 2024
opensearch-trigger-bot bot pushed a commit that referenced this pull request Apr 11, 2024
…tartedShardsBatch (#12504)

Signed-off-by: Aman Khare <[email protected]>
(cherry picked from commit 7103e56)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
shwetathareja pushed a commit that referenced this pull request Apr 12, 2024
…tartedShardsBatch (#12504) (#13156)

(cherry picked from commit 7103e56)

Signed-off-by: Aman Khare <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

4 participants