[DPE-4115] Performance Profile Support #466

phvalguima · 2024-10-01T17:19:15Z

This PR extends the current charm to support performance profiles following spec DA031 and add supports for the following profiles:

testing:
- focused in the integration tests and our CI -> 1G RAM dedicated to the Heap and no automation
staging:
- HA capabilities must be available: for that, we will enforce index template that encompasses all the indices and sets replica: 1-all
- Extends heap to: max(1G, 10% of RAM)
- indices.memory.index_buffer_size extends to 25%
- Adds three component templates, that will be described later
production:
- Same features as the staging, but heap is set instead to: max(1G, 50% of RAM)

The options above are set based on the following documents:
https://opensearch.org/docs/latest/tuning-your-cluster/performance/
https://opensearch.org/docs/latest/search-plugins/knn/performance-tuning/

The user can switch between the three options above, and depending on the selected value, the templates are created or destroyed.

UPDATE

One important question about this PR is if index templates with '*' will apply to system indices. So, the first part of this answer is: index templates are only applied at index creation, as shown here. We can delete index templates after indices were created based on that template.

Manual configuration (e.g. setting 0-all) will always take precedence.

There is an exception for hidden (not necessarily system) indices: if we have templates that are "catch-all", then they are not applied to hidden indices.

Sync charm docs from https://discourse.charmhub.io Co-authored-by: a-velasco <[email protected]>

Currently, we are having a lot of time outs in CA rotation testing. Breaking between small and large deployments and having parallel runners will help with that overall duration.

… and its config changes

lib/charms/opensearch/v0/helper_conf_setter.py

lib/charms/opensearch/v0/models.py

tests/unit/lib/test_opensearch_tls.py

Mehdi-Bendriss

Thanks @phvalguima.

I have a few points:

1. Replication factor:

We should never set the replication factor to 1-all - this is unnatural for non system indices and may cause the disk of all units to overflow quickly.

Replication factors are set by the user.

If we want to be safer, we should look for indices whose number_of_replicas < 1 and set it to 1. Not more.

2. Codec used:

Why did you choose zstd_no_dict instead of zst or instead qat_lz4 (in supported systems) or others? which one to choose? How?

It may also be needed to set a compression algorithm with each? which one to choose? how?

3. Heap size:

The heap size should, in production, be set to 50% but lower than 32Gb as per the official opensearch recommendations. here and here.

4. Units conversion:

The main complexity of this PR revolves around unit conversions, 2 things to note:

/proc/meminfo always returns units in Kb.
1. any reason to not use psutil ? this should save you the parsing of /meminfo
Jvm XMS and XMX properties accept either g|G -- m|M -- k|K, meaning the smallest unit is k.

With this in mind - it makes sense to normalize the values to the smallest unit supported by both, which is the Kb and work with it exclusively (when reading, calculating or writing to file).
Something along the lines of:

def jvm_size_in_kb(input: str) -> int:
    """Normalize the size values set in the jvm.options from supported units to Kb."""
    match = re.match(r"(\d+)([gmk])$", input.lower())

    value = float(match.group(1))
    jvm_formatted_unit = match.group(2)

    factor = 1
    if jvm_formatted_unit == "m":
        factor = 1024
    elif jvm_formatted_unit == "g":
        factor = 1024 * 1024

    # the loss is minimal since we're working with KBs
    return int(value * factor)

With this, I believe we do not need ByteUnit and JavaByteSize classes.

Similar to the the percentage method, which can simply be calculated as int(0.25 * val) ==> again, the loss of rounding to the floor is minimal because we are dealing with Kilobytes.

phvalguima · 2024-10-06T21:12:19Z

Hi @Mehdi-Bendriss, I will go over your points one by one.

Replication factor:
We should never set the replication factor to 1-all - this is unnatural for non system indices and may cause the disk of all units to overflow quickly.
...
If we want to be safer, we should look for indices whose number_of_replicas < 1 and set it to 1. Not more.

Indeed, I corrected that right after our conversation earlier this week in this commit.

Although I agree with the removal of -all, thinking of real world deployments, we always go with at least 3x nodes. Other operators like SQL DBs will always have 1x main + 2x replicas in-sync when deployed by field. We remove the -all but we could have a similar experience and set it to 2, as we are in the HA type of deployment, hence at least 3 nodes will be present.

Replication factors are set by the user.

True, but on our specification: DA031 - Profile config option, it is stated that both staging and production, we will be providing a highly-available and scalable service to be used in production.. Now, that is a bit open for interpretation in my view: "highly-available" in terms of OpenSearch services (i.e. only the service indices should be set for HA by our charm) OR in terms of using OpenSearch as well (i.e. even indices created by the user).

That is why I am trying to put together a minimum "index template", that at least assures an user that indices will be replicated, unless this user explicitly states otherwise.

Codec used:

TBH, I am not entirely set on each of the values we will discuss below. That is one of the reasons I have added them as component templates, so an user will build their own index templates on top.

I'd rather benchmark these for comparison on top. Maybe getting that landed on this PR is too much. TL;DR I think we can play it safer as follows:

break the "component template" setup on a separate branch, for now
we run performance tests until we are okay with which values to suggest
if we are okay with the results, we can report it and update first our documentation (an user can then follow and create their own indices / index template)
Eventually, come back to the branch from step (1) and merge it.

WDYT?

I will start with a later question:

which one to choose?

The codec selection came from these results.

Why did you choose zstd_no_dict instead of zst

In the case of zstd_no_dict, from the results you will see above, we would sacrifice 5% compression for +7% (net) p90 latency and +7% throughput (net) when compared with zstd.

or instead qat_lz4 (in supported systems) or others?

Happy to set QAT if we get the logic to detect and enable it. We are not yet there.

How?

First, we should keep in mind some types of indices cannot simply work with all the codecs (e.g. the vector indices).

As you also noticed, I am not really onboard with these results, as there are other parameters to be set. In this case, the How?? will have to be done the same way we've done the AVX testing: by running performance tests and comparing results.

It may also be needed to set a compression algorithm with each? which one to choose? how?

Yes, I also have this concern. What I think we should do here is run these component templates against benchmarks and document their results.

Heap size:
The heap size should, in production, be set to 50% but lower than 32Gb as per the official opensearch recommendations. here and here.

Thanks, that is a really good point. I will set a hard limit of up to 32G. Indeed having the GC going over 100s of Gigs does not sound good :)

Units conversion:

So, some thoughts here: (1) indeed sticking with Kb, as long as the JVM can accept the "32G" in Kb format, is a good idea and would simplify a lot; and (2) I was discussing with Big Data team could benefit from this logic here. The idea is to eventually move to the data_platform_helpers. From what I gathered, they are currently setting the JVM heap on a hard limited number rn.

The main complexity of this PR revolves around unit conversions, 2 things to note:

/proc/meminfo always returns units in Kb.

That is true as the kernel code shows.

any reason to not use psutil ? this should save you the parsing of /meminfo

Yes, two reasons: (1) the parsing goes from L589 to L598... I prefer 10 LoCs that process a file whose format is set in stone by the kernel than adding a new dependency (that actually is not shipped with stock Ubuntu anyways); and (2) it gives access to hugepages info as well, which we can potentially explore later, and would be "just there".

Jvm XMS and XMX properties accept either g|G -- m|M -- k|K, meaning the smallest unit is k.
With this in mind - it makes sense to normalize the values to the smallest unit supported by both, which is the Kb and work with it exclusively (when reading, calculating or writing to file).

Yes, I was going down to the bytes and coming back but indeed makes sense to stop on Kb instead and we handle all this in Kb.

lib/charms/opensearch/v0/opensearch_base_charm.py

lib/charms/opensearch/v0/opensearch_distro.py

tests/unit/lib/test_ml_plugins.py

…ct order

Mehdi-Bendriss

Thanks Pedro!

Mehdi-Bendriss · 2024-10-21T14:05:14Z

lib/charms/opensearch/v0/opensearch_performance_profile.py

+        if not self.peers_data.get(Scope.UNIT, PERFORMANCE_PROFILE):
+            return None
+        return OpenSearchPerfProfile.from_dict(
+            {"typ": self.peers_data.get(Scope.UNIT, PERFORMANCE_PROFILE)}
+        )


nit for conciseness:

Suggested change

if not self.peers_data.get(Scope.UNIT, PERFORMANCE_PROFILE):

return None

return OpenSearchPerfProfile.from_dict(

{"typ": self.peers_data.get(Scope.UNIT, PERFORMANCE_PROFILE)}

)

if not (profile := self.peers_data.get(Scope.UNIT, PERFORMANCE_PROFILE)):

return None

return OpenSearchPerfProfile.from_dict({"typ": profile})

reneradoi

Looks good! Only one minor thing that could potentially be changed, depending on your preference.

reneradoi · 2024-10-21T13:32:13Z

lib/charms/opensearch/v0/opensearch_base_charm.py

            return

+        perf_profile_needs_restart = False


This can be removed, it will be overwritten with True or False in line 738.

github-actions bot and others added 7 commits September 26, 2024 07:18

Sync docs from Discourse (#451)

a62f180

Sync charm docs from https://discourse.charmhub.io Co-authored-by: a-velasco <[email protected]>

[DPE-5558] Break CA rotation into integration test groups (#458)

c9edade

Currently, we are having a lot of time outs in CA rotation testing. Breaking between small and large deployments and having parallel runners will help with that overall duration.

Adding first batch of changes to the charm on how to process profiles…

3845e38

… and its config changes

Add unit tests for profile management

7b42f47

Extend replace() to cover multilines and look into all possible options

a3d033c

Add index/component template APIs

776c3f4

Add support for profile option in the integration tests

4b52e33

phvalguima commented Oct 1, 2024

View reviewed changes

lib/charms/opensearch/v0/helper_conf_setter.py Outdated Show resolved Hide resolved

phvalguima commented Oct 1, 2024

View reviewed changes

lib/charms/opensearch/v0/models.py Outdated Show resolved Hide resolved

lint fixes

50ea12b

phvalguima requested review from Mehdi-Bendriss and skourta and removed request for skourta October 1, 2024 17:23

phvalguima commented Oct 2, 2024

View reviewed changes

lib/charms/opensearch/v0/models.py Outdated Show resolved Hide resolved

phvalguima added 3 commits October 2, 2024 22:20

Fix first batch of unit tests

51c7cd5

Remove some dead LoCs because of commenting out lines of code

73cd84b

Update to 1 instead of 1-all

7c47f20

phvalguima commented Oct 2, 2024

View reviewed changes

tests/unit/lib/test_opensearch_tls.py Outdated Show resolved Hide resolved

Mehdi-Bendriss reviewed Oct 5, 2024

View reviewed changes

phvalguima requested a review from Mehdi-Bendriss October 9, 2024 16:29

phvalguima added 2 commits October 11, 2024 20:19

Update the changes following feedback

64f9bc9

lint fix

c856d3a

phvalguima commented Oct 12, 2024

View reviewed changes

lib/charms/opensearch/v0/opensearch_base_charm.py Show resolved Hide resolved

phvalguima commented Oct 12, 2024

View reviewed changes

lib/charms/opensearch/v0/opensearch_base_charm.py Outdated Show resolved Hide resolved

phvalguima commented Oct 12, 2024

View reviewed changes

lib/charms/opensearch/v0/opensearch_distro.py Outdated Show resolved Hide resolved

Update more tests and fixes

9cecd5f

phvalguima commented Oct 12, 2024

View reviewed changes

tests/unit/lib/test_ml_plugins.py Show resolved Hide resolved

phvalguima added 2 commits October 12, 2024 23:10

Rollback the original internal_users.yml

6a85319

Merge remote-tracking branch 'origin' into DPE-4115-performance-profiles

2d9ce65

phvalguima and others added 19 commits October 18, 2024 16:40

Update profiles

7d5f232

Add upgrade charm check

ebae15a

Minor fixes for the upgrade + reviews

0cd057d

Simplify the peer-cluster event

6a076da

Remove the ^

b6a2989

Fix unit tests

44014ff

Move away from refresh_relation_data

9e60288

Move to run()

0754ce6

Simplify current()

1b69290

fix lint

0dcdcb8

Fix upgrade assert

960c1ef

fix lint

c1f6b32

Update helpers.py

60e829a

Add logging; update upgrade helper to dispatch arguments in the corre…

9c20cd3

…ct order

Update test_ha_multi_clusters.py

fc2146c

Update test_ha_multi_clusters.py

4a8f5f0

Update test_tls.py

f481297

Update test_ca_rotation.py

72b8517

Fix: update config options across CI

a67e876

phvalguima requested review from Mehdi-Bendriss and reneradoi October 20, 2024 14:14

phvalguima added 5 commits October 21, 2024 10:57

Move away from main orch. leadership

047579d

Readd the apply index-template bool

cc12e1a

Add custom event with property

e7af36d

Reorder self.current + apply perf. templates

4419a66

Update helper_conf_setter.py

1df45a7

Mehdi-Bendriss approved these changes Oct 21, 2024

View reviewed changes

reneradoi approved these changes Oct 21, 2024

View reviewed changes

phvalguima merged commit cd1c034 into 2/edge Oct 21, 2024
35 of 40 checks passed

phvalguima deleted the DPE-4115-performance-profiles branch October 21, 2024 16:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DPE-4115] Performance Profile Support #466

[DPE-4115] Performance Profile Support #466

phvalguima commented Oct 1, 2024 •

edited

Loading

Mehdi-Bendriss left a comment •

edited

Loading

phvalguima commented Oct 6, 2024 •

edited

Loading

Mehdi-Bendriss left a comment

Mehdi-Bendriss Oct 21, 2024

reneradoi left a comment

reneradoi Oct 21, 2024

[DPE-4115] Performance Profile Support #466

[DPE-4115] Performance Profile Support #466

Conversation

phvalguima commented Oct 1, 2024 • edited Loading

Mehdi-Bendriss left a comment • edited Loading

Choose a reason for hiding this comment

1. Replication factor:

2. Codec used:

3. Heap size:

4. Units conversion:

phvalguima commented Oct 6, 2024 • edited Loading

Mehdi-Bendriss left a comment

Choose a reason for hiding this comment

Mehdi-Bendriss Oct 21, 2024

Choose a reason for hiding this comment

reneradoi left a comment

Choose a reason for hiding this comment

reneradoi Oct 21, 2024

Choose a reason for hiding this comment

phvalguima commented Oct 1, 2024 •

edited

Loading

Mehdi-Bendriss left a comment •

edited

Loading

phvalguima commented Oct 6, 2024 •

edited

Loading