Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Describing profile timing #2294

Closed
psaiz opened this issue Mar 1, 2022 · 6 comments
Closed

Describing profile timing #2294

psaiz opened this issue Mar 1, 2022 · 6 comments
Assignees
Labels
enhancement Enhancement or improvement to existing feature or request Indexing & Search

Comments

@psaiz
Copy link

psaiz commented Mar 1, 2022

Is your feature request related to a problem? Please describe.
I'm having issues understanding the timings of the profile information. In particular, I'm not sure if the different times should be added, or if the times are in parallel. Moreover, adding all the timing, I can't get to the took field. I have this example, with two clusters with the same data, and different node types behind. I'm doing the same query in both clusters. Looking at the profile info, I get:

$ jq '{took, queries: ([.profile.shards[].searches[].query[].time_in_nanos]|add/1000000), aggregations:([.profile.shards[].aggregations[].time_in_nanos]|add/1000000), collector: ([.profile.shards[].searches[].collector[].time_in_nanos]|add/1000000)}' curl_cluster_A
{
  "took": 359764,
  "queries": 1311458.528074,
  "aggregations": 2135875.84427,
  "collector": 2555350.670384
}

$ jq '{took, queries: ([.profile.shards[].searches[].query[].time_in_nanos]|add/1000000), aggregations:([.profile.shards[].aggregations[].time_in_nanos]|add/1000000), collector: ([.profile.shards[].searches[].collector[].time_in_nanos]|add/1000000)}' curl_cluster_B
{
  "took": 56889,
  "queries": 1294280.546153,
  "aggregations": 1355762.923427,
  "collector": 4280663.656489

So, looking at the field took, we see that cluster B is almost eight time faster. But looking at the time that it took for the queries, aggregations and collectors, they seem to be similar. Am I missing some timings? (this might be related to #1764 or #1263 )

Describe the solution you'd like
It would be nice if the profile information came by host, and then by shard, and, within each host the wall time that it took (instead of only the cpu time, as it seems to be at the moment)

Describe alternatives you've considered

Additional context

@psaiz psaiz added enhancement Enhancement or improvement to existing feature or request untriaged labels Mar 1, 2022
@dblock
Copy link
Member

dblock commented Mar 1, 2022

@sruti1312 you might know something about this

@anasalkouz
Copy link
Member

@Poojita-Raj Could you please take a look?

@Poojita-Raj
Copy link
Contributor

Hi @psaiz!

If you just set the top level "profile" parameter to true in a search request, the response you get will be much more detailed and is already divided on a shard by shard basis. Each shard id will be in the form [nodeID][indexName][shardID] which already gives you the host and shard information you desire.

For each shard, you receive information about query execution, cumulative rewrite time, lucene collector execution time and information about aggregation execution.

The total "took" time you see at the beginning does not take into account time spent in the search fetch phase, time that the requests spend in queues or time spent merging shard responses on the coordinating node (which is what you referenced in issues #1764 and #1263).

Timings are listed in wall-clock nanoseconds. Additionally, collector times are independent from the Query times and the api itself adds an overhead to the execution so you can't perform straightforward addition to get to the total. This is what leads to the discrepancy where when you add all the timing you're not able to get the took field.

I hope this clears any confusion you had regarding the timing output!

@Poojita-Raj Poojita-Raj assigned Poojita-Raj and unassigned anasalkouz Mar 3, 2022
@dblock
Copy link
Member

dblock commented Mar 3, 2022

thanks @Poojita-Raj. I think at the very least we need to document this better? @psaiz what do you think should be the action items from this issue?

@psaiz
Copy link
Author

psaiz commented Mar 7, 2022

Hi @Poojita-Raj and @dblock

First of all, thanks for your comments and explanations.
I'm not sure I follow the comment about the top level profile. I'm doing the query like:

{"size":0, "profile":true}

I hope this is the top level mentioned above. Doing this query, I get something like:

{
  "took" : 6,  ...
   "profile" : {
    "shards" : [
      {
        "id" : "[TMy2BD_dS5SWS8hru90Ccw][opensearch_dashboards_sample_data_logs][0]",
        "searches" : [
          {
            "query" : [
              {
                "time_in_nanos" : 3875,...
   ...           }
            ],
            "rewrite_time" : 3931,
            "collector" : [
              {
                "time_in_nanos" : 102474, ...
              }
            ]
          }
        ],
        "aggregations" : [ ]
      }
    ]
  }
}

So, for a query like this, on a single shard, that takes around 6 ms, the profiler give information about 0.1 ms. It's a pity that there is no hint about where the most part of the time is spent. Probably a part of that time is spent during the authentication. If that's the case, it would be great if that time could be visible somehow.

You are right that the node id it's already available in the shard id. At the same time, are the shards in the same host searched in parallel or sequentially? Imagine that the previous query now runs over multiple hosts, with multiple shards each.
It would be very useful to have the output already broken down by the node, and then with the time_in_nanos already available at the host level

{ 'profile': 
  {'node': {  
       'TMy2BD_dS5SWS8hru90Ccw' :{ 
              'time_in_nanos': XXX, 
              'shards' : {
                     '[opensearch_dashboards_sample_data_logs][0]' : {
                         'time_in_nanos': YYY, ....

}}}}}

Coming back to the actions, I would say:

  • Documentation indeed would help a lot!
  • It would be nice if the profile could have multiple settings. I see the value in the shard info that it provides at the moment. Imagine for instance adding a tag like profile_level with possible values like shard (default), node and node_shard. If it is set to node, the response would return the time spent by each node, including the coordinating node. 'shard' level would return the output as it is right now, and node_shard would return a combined summary.

Does that make sense?

For the record, I'm using:

  "version" : {
    "distribution" : "opensearch",
    "number" : "1.2.0",
    "build_type" : "tar",
    "build_hash" : "c459282fd67ddb17dcc545ec9bcdc805880bcbec",
    "build_date" : "2021-11-22T16:57:18.360386Z",
 ``

@Poojita-Raj
Copy link
Contributor

Opened an issue for adding documentation on profile api here - opensearch-project/documentation-website#3592.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Indexing & Search
Projects
None yet
Development

No branches or pull requests

5 participants