-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Describing profile timing #2294
Comments
@sruti1312 you might know something about this |
@Poojita-Raj Could you please take a look? |
Hi @psaiz! If you just set the top level "profile" parameter to true in a search request, the response you get will be much more detailed and is already divided on a shard by shard basis. Each shard id will be in the form [nodeID][indexName][shardID] which already gives you the host and shard information you desire. For each shard, you receive information about query execution, cumulative rewrite time, lucene collector execution time and information about aggregation execution. The total "took" time you see at the beginning does not take into account time spent in the search fetch phase, time that the requests spend in queues or time spent merging shard responses on the coordinating node (which is what you referenced in issues #1764 and #1263). Timings are listed in wall-clock nanoseconds. Additionally, collector times are independent from the Query times and the api itself adds an overhead to the execution so you can't perform straightforward addition to get to the total. This is what leads to the discrepancy where when you add all the timing you're not able to get the took field. I hope this clears any confusion you had regarding the timing output! |
thanks @Poojita-Raj. I think at the very least we need to document this better? @psaiz what do you think should be the action items from this issue? |
Hi @Poojita-Raj and @dblock First of all, thanks for your comments and explanations.
I hope this is the top level mentioned above. Doing this query, I get something like:
So, for a query like this, on a single shard, that takes around 6 ms, the profiler give information about 0.1 ms. It's a pity that there is no hint about where the most part of the time is spent. Probably a part of that time is spent during the authentication. If that's the case, it would be great if that time could be visible somehow. You are right that the node id it's already available in the shard id. At the same time, are the shards in the same host searched in parallel or sequentially? Imagine that the previous query now runs over multiple hosts, with multiple shards each.
Coming back to the actions, I would say:
Does that make sense? For the record, I'm using:
|
Opened an issue for adding documentation on profile api here - opensearch-project/documentation-website#3592. |
Is your feature request related to a problem? Please describe.
I'm having issues understanding the timings of the profile information. In particular, I'm not sure if the different times should be added, or if the times are in parallel. Moreover, adding all the timing, I can't get to the
took
field. I have this example, with two clusters with the same data, and different node types behind. I'm doing the same query in both clusters. Looking at the profile info, I get:So, looking at the field
took
, we see that cluster B is almost eight time faster. But looking at the time that it took for the queries, aggregations and collectors, they seem to be similar. Am I missing some timings? (this might be related to #1764 or #1263 )Describe the solution you'd like
It would be nice if the profile information came by host, and then by shard, and, within each host the wall time that it took (instead of only the cpu time, as it seems to be at the moment)
Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered: