Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[POC] Add ThreadContext user info in top queries #12529

Closed

Conversation

ansjcy
Copy link
Member

@ansjcy ansjcy commented Mar 4, 2024

Description

Add remote ip address for top n queries.

Related Issues

This is a follow up of #11904 to finish #11186

tests

  • Start an OpenSearch Cluster.
  • Install query-insights plugin and enable top n queries
curl -X PUT 'localhost:9200/_cluster/settings' -H 'Content-Type: application/json' -d'
{
    "persistent" : {
        "search.insights.top_queries.latency.enabled" : "true",
        "search.insights.top_queries.latency.window_size" : "1m",
        "search.insights.top_queries.latency.top_n_size" : 5
    }
}'
  • Index a document for searching
POST /my-index-0/_doc/?pretty
{
  "@timestamp": "2099-11-15T13:12:00",
  "message": "this is document 1",
  "user": {
    "id": "cyji"
  }
}
  • Do search on different browsers
GET _search
{
  "query": {
    "match_all": {}
  }
}
  • Check top queries
GET _insights/top_queries

{
  "top_queries": [
    {
      "timestamp": 1709161455119,
      "search_type": "query_then_fetch",
      "node_id": "RTr6rGKbSW-9_Lmli2HVYA",
      "total_shards": 10,
      "indices": [],
      "phase_latency_map": {
        "expand": 0,
        "query": 7,
        "fetch": 5
      },
      "source": "{\"query\":{\"match_all\":{\"boost\":1.0}}}",
      "remote_address": "172.18.0.2:37416",
      "latency": 14
    },
    {
      "timestamp": 1709161458352,
      "search_type": "query_then_fetch",
      "node_id": "yNwke7ByTZOsdSsIaGZdbw",
      "total_shards": 10,
      "indices": [],
      "phase_latency_map": {
        "expand": 0,
        "query": 7,
        "fetch": 3
      },
      "source": "{\"query\":{\"match_all\":{\"boost\":1.0}}}",
      "remote_address": "172.18.0.2:37412",
      "latency": 12
    },
    {
      "timestamp": 1709161464149,
      "search_type": "query_then_fetch",
      "node_id": "RTr6rGKbSW-9_Lmli2HVYA",
      "total_shards": 10,
      "indices": [],
      "phase_latency_map": {
        "expand": 0,
        "query": 6,
        "fetch": 2
      },
      "source": "{\"query\":{\"match_all\":{\"boost\":1.0}}}",
      "remote_address": "172.18.0.2:37416",
      "latency": 10
    }
  ]
}

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

github-actions bot commented Mar 4, 2024

❌ Gradle check result for bfdefdd: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Mar 4, 2024

❌ Gradle check result for 5bfcc77: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Mar 4, 2024

Compatibility status:

Checks if related components are compatible with change 41ea201

Incompatible components

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/flow-framework.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/sql.git]

@ansjcy ansjcy force-pushed the add-user-info-in-top-queries branch from 5bfcc77 to 27ee977 Compare March 5, 2024 22:43
Copy link
Contributor

github-actions bot commented Mar 5, 2024

❕ Gradle check result for 27ee977: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.repositories.azure.AzureBlobStoreRepositoryTests.testRequestStats

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link

codecov bot commented Mar 5, 2024

Codecov Report

Attention: Patch coverage is 80.00000% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 71.43%. Comparing base (b15cb0c) to head (fe9c669).
Report is 31 commits behind head on main.

❗ Current head fe9c669 differs from pull request most recent head 41ea201. Consider uploading reports for the commit 41ea201 to get more accurate results

Files Patch % Lines
...opensearch/action/search/SearchRequestContext.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #12529      +/-   ##
============================================
+ Coverage     71.42%   71.43%   +0.01%     
+ Complexity    59978    59960      -18     
============================================
  Files          4985     4984       -1     
  Lines        282275   282251      -24     
  Branches      40946    40952       +6     
============================================
+ Hits         201603   201626      +23     
+ Misses        63999    63914      -85     
- Partials      16673    16711      +38     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ansjcy
Copy link
Member Author

ansjcy commented Mar 6, 2024

We shouldn't add dependencies on common-utils before we move queries insights to its own repo. Otherwise it will introduce circular dependency between OpenSearch core repo and common-utils repo.

Copy link
Contributor

github-actions bot commented Mar 6, 2024

❌ Gradle check result for 2688389: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@ansjcy ansjcy marked this pull request as draft March 6, 2024 21:57
@ansjcy
Copy link
Member Author

ansjcy commented Mar 6, 2024

Discussed with .peternied, reading from thread context is actually not recommended. We should explore extending IdentityPlugin https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/plugins/IdentityPlugin.java#L19 for this purpose.

related:

Copy link
Member

@peternied peternied left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ansjcy Thanks for creating this PR - it sounds like a really useful feature for sysadmins.

This is a great use case for Identity - where information about who is operating a request should come from. You can use the other plugin identity-shiro [1] to prove out your scenarios.

Furthermore there is a problem of 'userless' queries. There are many system tasks and plugin operations via job scheduler that will not have user identity tied to them. While not directly related if pulled from identity systems, the lack of insight here might warrant follow up work for your use case.

If this PR is slimed down the the source - ip address that seems like a iterative step while building up support for exposing user identity.

Comment on lines 51 to 66
/**
* Username of the user who sent this request
*/
USER_NAME,
/**
* Backend roles of the user who sent this request
*/
USER_BACKEND_ROLES,
/**
* Roles of the user who sent this request
*/
USER_ROLES,
/**
* Tenant info of the user who sent this request
*/
USER_TENANT;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of these are concepts from the security plugin, these should not be inside of core directly - instead lets pull them from IdentityService.getSubject() [1]

@ansjcy ansjcy force-pushed the add-user-info-in-top-queries branch 2 times, most recently from 07dc669 to fe9c669 Compare March 7, 2024 22:13
Copy link
Contributor

github-actions bot commented Mar 7, 2024

❕ Gradle check result for 29e62f8: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.search.SearchWeightedRoutingIT.testShardRoutingWithNetworkDisruption_FailOpenEnabled

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link
Contributor

github-actions bot commented Mar 7, 2024

❕ Gradle check result for 07dc669: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.search.SearchWeightedRoutingIT.testMultiGetWithNetworkDisruption_FailOpenEnabled

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link
Contributor

github-actions bot commented Mar 7, 2024

✅ Gradle check result for fe9c669: SUCCESS

Copy link
Member

@peternied peternied left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an integration test for this scenario? If not lets add one, or edit the existing test case to verify remote addresses are sensible.

@@ -51,6 +51,10 @@ public Map<String, Long> phaseTookMap() {
return phaseTookMap;
}

public String getRequestRemoteAddress() {
return searchRequest.remoteAddress().toString();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Perf/Memory] Lets not toString() the address until/unless it is used to prevent the string allocation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, if you would prefer to resolve this thread by creating new issue to investigate this area. Loggers like Log4j have great examples on how to minimize these allocations but might be better to do a single pass than a one-off fix for this attribute.

@@ -138,6 +138,7 @@ public void onRequestEnd(final SearchPhaseContext context, final SearchRequestCo
attributes.put(Attribute.TOTAL_SHARDS, context.getNumShards());
attributes.put(Attribute.INDICES, request.indices());
attributes.put(Attribute.PHASE_LATENCY_MAP, searchRequestContext.phaseTookMap());
attributes.put(Attribute.REMOTE_ADDRESS, searchRequestContext.getRequestRemoteAddress());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Code smell] Property bag create situations where it is not clear if all values are well-defined. If all these attributes are expected as part of a search record aren't they codified? e.g. new SearchQueryAttributes.Builder().remoteAddress(searchRequestContext.getRemoteAddress()...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! When we design the SearchQueryRecord class, we wanted to make it generic and compatible with the OpenTelemetry format to represent a timeseries datapoint. We decided to only keep most common fields like timestamp as field of the record itself and other attributes should go into the attribute map. It might be a good idea to do a second pass of the current attributes, to see if it make sense to move more common fields into record fields.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detail. This might be an area to look into if it becomes cumbersome... such as the changes that were added for the unit test 😉

@ansjcy ansjcy marked this pull request as ready for review March 8, 2024 19:56
Copy link
Member

@peternied peternied left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am about to go on vacation so I won't be able to follow up for a while.

Fellow maintainers please feel free to dismiss my review when integration tests are added/updated to unblock merging of this PR. Thanks!

@@ -138,6 +138,7 @@ public void onRequestEnd(final SearchPhaseContext context, final SearchRequestCo
attributes.put(Attribute.TOTAL_SHARDS, context.getNumShards());
attributes.put(Attribute.INDICES, request.indices());
attributes.put(Attribute.PHASE_LATENCY_MAP, searchRequestContext.phaseTookMap());
attributes.put(Attribute.REMOTE_ADDRESS, searchRequestContext.getRequestRemoteAddress());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detail. This might be an area to look into if it becomes cumbersome... such as the changes that were added for the unit test 😉

@ansjcy ansjcy force-pushed the add-user-info-in-top-queries branch from fe9c669 to ec91f15 Compare March 14, 2024 21:10
Copy link
Contributor

❌ Gradle check result for 41ea201: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❕ Gradle check result for ec91f15: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Comment on lines +227 to +229

// set remote address for searchRequest
searchRequest.remoteAddress(new TransportAddress(request.getHttpChannel().getRemoteAddress()));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's safe to hijack remoteAddress() like this.

It's intended to be set by InboundHandler on transport requests. Setting it on a REST request may have unintended consequences.

Also, if the request comes through a load balancer or a proxy, this value will almost certainly be the load balancer or proxy's address.

@ansjcy
Copy link
Member Author

ansjcy commented Mar 18, 2024

Closing the PR as per peternied and froh's comments. We need to again carefully review what should be the best way to move forward to add user related information to top queries - Let's check the Identity service to see what are the gaps to get user info from there.

@ansjcy ansjcy closed this Mar 18, 2024
@ansjcy ansjcy changed the title Add user info in top queries [POC] Add ThreadContext user info in top queries Jun 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants