From 7122b97c599d388ba120b83ac8f8c1a3edb4798f Mon Sep 17 00:00:00 2001 From: Roy214 Date: Mon, 21 Aug 2023 16:55:22 -0500 Subject: [PATCH] TROUBLESHOOTING: Performance tuning in sssd This page will describe the performance related issues in sssd and how to troubleshoot those issues. --- src/contents.rst | 1 + src/troubleshooting/performance.rst | 50 +++++++++++++++++++++++++++++ 2 files changed, 51 insertions(+) create mode 100644 src/troubleshooting/performance.rst diff --git a/src/contents.rst b/src/contents.rst index be77cdf..b89f8ad 100644 --- a/src/contents.rst +++ b/src/contents.rst @@ -93,6 +93,7 @@ Table of Contents Basics Backend SSSD Errors + Performance Tuning in SSSD Log Analyzer Fleet Commander SUDO diff --git a/src/troubleshooting/performance.rst b/src/troubleshooting/performance.rst new file mode 100644 index 0000000..669bb05 --- /dev/null +++ b/src/troubleshooting/performance.rst @@ -0,0 +1,50 @@ +Performance tuning SSSD +####################### + +Slow id lookup +************** +This has been noticed id lookup become slow if the LDAP/AD user is a member of large groups say for example user is a member of 300+ groups. ``id`` is very heavy. ``id`` does a lot under its hood. +Behind the scenes, when the ``id $user`` command is executed it triggers the following: + +- Get user information - getpwnam() for the user + +- Get primary group information - getgrgid() for the primary group of the user + +- Get list of groups - getgrouplist() for the user + +- Get group information for each group returned from step 3 - getgrid() for all GIDs returned from getgrouplist() call. + +We need to identify out of the above 4 steps which step is actually slow. In order to collect detailed infromation we need to add ``debug_level = 9`` under the ``[$domain]`` section of the ``/etc/sssd/sssd.conf`` followed by a ``sssd`` restart. We often noticed step 4 is the step where sssd takes most of its time because the most data-intensive operation is downloading the groups including their members and by default this feature is enabled we can turn this off by setting ``ignore_group_members = true``. +Usually, we are interested in what groups a user is a member of (id aduser@ad_domain) as the initial step rather than what members do specific groups include (getent group adgroup@ad_domain). Setting the ignore_group_members option to True makes all groups appear as empty, thus downloading only information about the group objects themselves and not their members, providing a significant performance boost. Please note that id aduser@ad_domain would still return all the correct groups. + +- Pros: getgrnam/getgrgid calls are significantly faster. +- Cons: getgrnam/getgrgid calls only return the group information, not the members + +**WARNING! If the compat tree is used, DO NOT SET ignore_group_members = true because it breaks the compatibility logic.** + +If after disbaling the group_members call still the look is slow in that case we can get into the logs and verify how long the ``Initgroups`` call is taking this can be done by grepping the ``CR`` no. of that id lookup request. In this example here ``sssd_nss`` is taking ``1 sec`` to process the user group membership here we have only 39 groups associated with the user if the count is large say for example 300-400 and the ``ignore_group_members`` is not to set true then this is expected the id lookup will take some time with the cold cache. + +.. code-block:: sssd-log + + $ grep 'CR #3\:' /var/log/sssd/sssd_nss.log + (2023-06-08 12:21:31): [nss] [cache_req_set_plugin] (0x2000): CR #3: Setting "Initgroups by name" plugin + (2023-06-08 12:21:31): [nss] [cache_req_send] (0x0400): CR #3: New request 'Initgroups by name' + (2023-06-08 12:21:31): [nss] [cache_req_process_input] (0x0400): CR #3: Parsing input name [roy] + (2023-06-08 12:21:31): [nss] [cache_req_set_name] (0x0400): CR #3: Setting name [roy] + (2023-06-08 12:21:31): [nss] [cache_req_select_domains] (0x0400): CR #3: Performing a multi-domain search + (2023-06-08 12:21:31): [nss] [cache_req_search_domains] (0x0400): CR #3: Search will check the cache and check the data provider + (2023-06-08 12:21:31): [nss] [cache_req_set_domain] (0x0400): CR #3: Using domain [redhat.com] + (2023-06-08 12:21:31): [nss] [cache_req_prepare_domain_data] (0x0400): CR #3: Preparing input data for domain [redhat.com] rules + (2023-06-08 12:21:31): [nss] [cache_req_search_send] (0x0400): CR #3: Looking up roy@redhat.com + (2023-06-08 12:21:31): [nss] [cache_req_search_ncache] (0x0400): CR #3: Checking negative cache for [roy@redhat.com] + (2023-06-08 12:21:31): [nss] [cache_req_search_ncache] (0x0400): CR #3: [roy@redhat.com] is not present in negative cache + (2023-06-08 12:21:31): [nss] [cache_req_search_cache] (0x0400): CR #3: Looking up [roy@redhat.com] in cache + (2023-06-08 12:21:31): [nss] [cache_req_search_send] (0x0400): CR #3: Object found, but needs to be refreshed. + (2023-06-08 12:21:31): [nss] [cache_req_search_dp] (0x0400): CR #3: Looking up [roy@redhat.com] in data provider + (2023-06-08 12:21:32): [nss] [cache_req_search_cache] (0x0400): CR #3: Looking up [roy@redhat.com] in cache + (2023-06-08 12:21:32): [nss] [cache_req_search_ncache_filter] (0x0400): CR #3: This request type does not support filtering result by negative cache + (2023-06-08 12:21:32): [nss] [cache_req_search_done] (0x0400): CR #3: Returning updated object [roy@redhat.com] + (2023-06-08 12:21:32): [nss] [cache_req_create_and_add_result] (0x0400): CR #3: Found 39 entries in domain redhat.com <--------- + (2023-06-08 12:21:32): [nss] [cache_req_done] (0x0400): CR #3: Finished: Success + +The above troubleshooting can be done easily with ``sssctl analyze request list`` and ``sssctl analyze request show ``. For more details, please refer to :doc:`Log Analyzer `