Describing profile timing #2294

psaiz · 2022-03-01T13:24:15Z

Is your feature request related to a problem? Please describe.
I'm having issues understanding the timings of the profile information. In particular, I'm not sure if the different times should be added, or if the times are in parallel. Moreover, adding all the timing, I can't get to the took field. I have this example, with two clusters with the same data, and different node types behind. I'm doing the same query in both clusters. Looking at the profile info, I get:

$ jq '{took, queries: ([.profile.shards[].searches[].query[].time_in_nanos]|add/1000000), aggregations:([.profile.shards[].aggregations[].time_in_nanos]|add/1000000), collector: ([.profile.shards[].searches[].collector[].time_in_nanos]|add/1000000)}' curl_cluster_A
{
  "took": 359764,
  "queries": 1311458.528074,
  "aggregations": 2135875.84427,
  "collector": 2555350.670384
}

$ jq '{took, queries: ([.profile.shards[].searches[].query[].time_in_nanos]|add/1000000), aggregations:([.profile.shards[].aggregations[].time_in_nanos]|add/1000000), collector: ([.profile.shards[].searches[].collector[].time_in_nanos]|add/1000000)}' curl_cluster_B
{
  "took": 56889,
  "queries": 1294280.546153,
  "aggregations": 1355762.923427,
  "collector": 4280663.656489

So, looking at the field took, we see that cluster B is almost eight time faster. But looking at the time that it took for the queries, aggregations and collectors, they seem to be similar. Am I missing some timings? (this might be related to #1764 or #1263 )

Describe the solution you'd like
It would be nice if the profile information came by host, and then by shard, and, within each host the wall time that it took (instead of only the cpu time, as it seems to be at the moment)

Describe alternatives you've considered

Additional context

The text was updated successfully, but these errors were encountered:

dblock · 2022-03-01T16:36:24Z

@sruti1312 you might know something about this

anasalkouz · 2022-03-01T21:50:02Z

@Poojita-Raj Could you please take a look?

Poojita-Raj · 2022-03-03T18:16:23Z

Hi @psaiz!

If you just set the top level "profile" parameter to true in a search request, the response you get will be much more detailed and is already divided on a shard by shard basis. Each shard id will be in the form [nodeID][indexName][shardID] which already gives you the host and shard information you desire.

For each shard, you receive information about query execution, cumulative rewrite time, lucene collector execution time and information about aggregation execution.

The total "took" time you see at the beginning does not take into account time spent in the search fetch phase, time that the requests spend in queues or time spent merging shard responses on the coordinating node (which is what you referenced in issues #1764 and #1263).

Timings are listed in wall-clock nanoseconds. Additionally, collector times are independent from the Query times and the api itself adds an overhead to the execution so you can't perform straightforward addition to get to the total. This is what leads to the discrepancy where when you add all the timing you're not able to get the took field.

I hope this clears any confusion you had regarding the timing output!

dblock · 2022-03-03T19:21:33Z

thanks @Poojita-Raj. I think at the very least we need to document this better? @psaiz what do you think should be the action items from this issue?

psaiz · 2022-03-07T16:11:42Z

Hi @Poojita-Raj and @dblock

First of all, thanks for your comments and explanations.
I'm not sure I follow the comment about the top level profile. I'm doing the query like:

{"size":0, "profile":true}

I hope this is the top level mentioned above. Doing this query, I get something like:

{
  "took" : 6,  ...
   "profile" : {
    "shards" : [
      {
        "id" : "[TMy2BD_dS5SWS8hru90Ccw][opensearch_dashboards_sample_data_logs][0]",
        "searches" : [
          {
            "query" : [
              {
                "time_in_nanos" : 3875,...
   ...           }
            ],
            "rewrite_time" : 3931,
            "collector" : [
              {
                "time_in_nanos" : 102474, ...
              }
            ]
          }
        ],
        "aggregations" : [ ]
      }
    ]
  }
}

So, for a query like this, on a single shard, that takes around 6 ms, the profiler give information about 0.1 ms. It's a pity that there is no hint about where the most part of the time is spent. Probably a part of that time is spent during the authentication. If that's the case, it would be great if that time could be visible somehow.

You are right that the node id it's already available in the shard id. At the same time, are the shards in the same host searched in parallel or sequentially? Imagine that the previous query now runs over multiple hosts, with multiple shards each.
It would be very useful to have the output already broken down by the node, and then with the time_in_nanos already available at the host level

{ 'profile': 
  {'node': {  
       'TMy2BD_dS5SWS8hru90Ccw' :{ 
              'time_in_nanos': XXX, 
              'shards' : {
                     '[opensearch_dashboards_sample_data_logs][0]' : {
                         'time_in_nanos': YYY, ....

}}}}}

Coming back to the actions, I would say:

Documentation indeed would help a lot!
It would be nice if the profile could have multiple settings. I see the value in the shard info that it provides at the moment. Imagine for instance adding a tag like profile_level with possible values like shard (default), node and node_shard. If it is set to node, the response would return the time spent by each node, including the coordinating node. 'shard' level would return the output as it is right now, and node_shard would return a combined summary.

Does that make sense?

For the record, I'm using:

  "version" : {
    "distribution" : "opensearch",
    "number" : "1.2.0",
    "build_type" : "tar",
    "build_hash" : "c459282fd67ddb17dcc545ec9bcdc805880bcbec",
    "build_date" : "2021-11-22T16:57:18.360386Z",
 ``

Poojita-Raj · 2023-03-28T22:20:41Z

Opened an issue for adding documentation on profile api here - opensearch-project/documentation-website#3592.

psaiz added enhancement Enhancement or improvement to existing feature or request untriaged labels Mar 1, 2022

CEHENKLE assigned anasalkouz Mar 1, 2022

CEHENKLE added v1.3.0 and removed v1.3.0 labels Mar 1, 2022

anasalkouz added the Indexing & Search label Mar 1, 2022

Poojita-Raj assigned Poojita-Raj and unassigned anasalkouz Mar 3, 2022

anasalkouz removed the untriaged label Mar 8, 2022

Poojita-Raj closed this as completed Mar 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Describing profile timing #2294

Describing profile timing #2294

psaiz commented Mar 1, 2022

dblock commented Mar 1, 2022

anasalkouz commented Mar 1, 2022

Poojita-Raj commented Mar 3, 2022

dblock commented Mar 3, 2022

psaiz commented Mar 7, 2022 •

edited

Loading

Poojita-Raj commented Mar 28, 2023

Describing profile timing #2294

Describing profile timing #2294

Comments

psaiz commented Mar 1, 2022

dblock commented Mar 1, 2022

anasalkouz commented Mar 1, 2022

Poojita-Raj commented Mar 3, 2022

dblock commented Mar 3, 2022

psaiz commented Mar 7, 2022 • edited Loading

Poojita-Raj commented Mar 28, 2023

psaiz commented Mar 7, 2022 •

edited

Loading