From b13be7adffd6b24f48f9ac3fe8108f0c8065de84 Mon Sep 17 00:00:00 2001 From: Giuliano Rodrigues Lima <91879848+grlimacan@users.noreply.github.com> Date: Wed, 11 Dec 2024 16:08:26 +0200 Subject: [PATCH 1/3] chore: Remove Optimize OpenSearch warnings --- .../backup-restore/optimize-backup.md | 42 +++++++++---------- .../advanced-features/import-guide.md | 8 +++- .../shared-elasticsearch-cluster.md | 4 -- .../system-configuration-platform-7.md | 6 ++- .../configuration/system-configuration.md | 8 ---- .../optimize-deployment/install-and-start.md | 4 -- .../advanced-features/import-guide.md | 8 +++- 7 files changed, 37 insertions(+), 43 deletions(-) diff --git a/docs/self-managed/operational-guides/backup-restore/optimize-backup.md b/docs/self-managed/operational-guides/backup-restore/optimize-backup.md index fd86d993ec0..2215873da0d 100644 --- a/docs/self-managed/operational-guides/backup-restore/optimize-backup.md +++ b/docs/self-managed/operational-guides/backup-restore/optimize-backup.md @@ -11,20 +11,20 @@ This release introduces breaking changes, including the utilized URL. For example, `curl 'http://localhost:8092/actuator/backups'` rather than the previously used `backup`. ::: -Optimize stores its data over multiple indices in Elasticsearch. To ensure data integrity across indices, a backup of Optimize data consists of two Elasticsearch snapshots, each containing a different set of Optimize indices. Each backup is identified by a positive integer backup ID. For example, a backup with ID `123456` consists of the following Elasticsearch snapshots: +Optimize stores its data over multiple indices in the database. To ensure data integrity across indices, a backup of Optimize data consists of two ElasticSearch/OpenSearch snapshots, each containing a different set of Optimize indices. Each backup is identified by a positive integer backup ID. For example, a backup with ID `123456` consists of the following snapshots: ``` camunda_optimize_123456_3.9.0_part_1_of_2 camunda_optimize_123456_3.9.0_part_2_of_2 ``` -Optimize provides an API to trigger a backup and retrieve information about a given backup's state. During backup creation Optimize can continue running. The backed up data can later be restored using the standard Elasticsearch snapshot restore API. +Optimize provides an API to trigger a backup and retrieve information about a given backup's state. During backup creation Optimize can continue running. The backed up data can later be restored using the standard ElasticSearch/OpenSearch snapshot restore API. ## Prerequisites The following prerequisites must be set up before using the backup API: -1. A snapshot repository of your choice must be registered with Elasticsearch. +1. A snapshot repository of your choice must be registered with ElasticSearch/OpenSearch. 2. The repository name must be specified using the `CAMUNDA_OPTIMIZE_BACKUP_REPOSITORY_NAME` environment variable, or by adding it to your Optimize [`environment-config.yaml`]($optimize$/self-managed/optimize-deployment/configuration/system-configuration/): ```yaml @@ -48,13 +48,13 @@ POST actuator/backups ### Response -| Code | Description | -| ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------- | -| 202 Accepted | Backup process was successfully initiated. To determine whether backup process was completed refer to the GET API. | -| 400 Bad Request | Indicates issues with the request, for example when the `backupId` contains invalid characters. | -| 409 Conflict | Indicates that a backup with the same `backupId` already exists. | -| 500 Server Error | All other errors, e.g. issues communicating with Elasticsearch for snapshot creation. Refer to the returned error message for more details. | -| 502 Bad Gateway | Optimize has encountered issues while trying to connect to Elasticsearch. | +| Code | Description | +| ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------ | +| 202 Accepted | Backup process was successfully initiated. To determine whether backup process was completed refer to the GET API. | +| 400 Bad Request | Indicates issues with the request, for example when the `backupId` contains invalid characters. | +| 409 Conflict | Indicates that a backup with the same `backupId` already exists. | +| 500 Server Error | All other errors, e.g. issues communicating with the database for snapshot creation. Refer to the returned error message for more details. | +| 502 Bad Gateway | Optimize has encountered issues while trying to connect to the database. | ### Example request @@ -96,8 +96,8 @@ GET actuator/backup | 200 OK | Backup state could be determined and is returned in the response body (see example below). | | 400 Bad Request | There is an issue with the request, for example the repository name specified in the Optimize configuration does not exist. Refer to returned error message for details. | | 404 Not Found | If a backup ID was specified, no backup with that ID exists. | -| 500 Server Error | All other errors, e.g. issues communicating with Elasticsearch for snapshot state retrieval. Refer to the returned error message for more details. | -| 502 Bad Gateway | Optimize has encountered issues while trying to connect to Elasticsearch. | +| 500 Server Error | All other errors, e.g. issues communicating with the database for snapshot state retrieval. Refer to the returned error message for more details. | +| 502 Bad Gateway | Optimize has encountered issues while trying to connect to the database. | ### Example request @@ -135,8 +135,8 @@ Possible states of the backup: - `COMPLETE`: The backup can be used for restoring data. - `IN_PROGRESS`: The backup process for this backup ID is still in progress. -- `FAILED`: Something went wrong when creating this backup. To find out the exact problem, use the [Elasticsearch get snapshot status API](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/get-snapshot-status-api.html) for each of the snapshots included in the given backup. -- `INCOMPATIBLE`: The backup is incompatible with the current Elasticsearch version. +- `FAILED`: Something went wrong when creating this backup. To find out the exact problem, use the [Elasticsearch](https://www.elastic.co/guide/en/elasticsearch/reference/current/get-snapshot-status-api.html) / [OpenSearch](https://opensearch.org/docs/latest/api-reference/snapshots/get-snapshot-status/) get snapshot status API for each of the snapshots included in the given backup. +- `INCOMPATIBLE`: The backup is incompatible with the current ElasticSearch/OpenSearch version. - `INCOMPLETE`: The backup is incomplete (this could occur when the backup process was interrupted or individual snapshots were deleted). ## Delete backup API @@ -154,10 +154,10 @@ DELETE actuator/backups/{backupId} | Code | Description | | ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -| 204 No Content | The delete request for the associated snapshots was submitted to Elasticsearch successfully. | +| 204 No Content | The delete request for the associated snapshots was submitted to the database successfully. | | 400 Bad Request | There is an issue with the request, for example the repository name specified in the Optimize configuration does not exist. Refer to returned error message for details. | | 500 Server Error | An error occurred, for example the snapshot repository does not exist. Refer to the returned error message for details. | -| 502 Bad Gateway | Optimize has encountered issues while trying to connect to Elasticsearch. | +| 502 Bad Gateway | Optimize has encountered issues while trying to connect to ElasticSearch/OpenSearch. | ### Example request @@ -167,22 +167,22 @@ curl --request DELETE 'http://localhost:8092/actuator/backups/123456' ## Restore backup -There is no Optimize API to perform the backup restore. Instead, the standard [Elasticsearch restore snapshot API](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/restore-snapshot-api.html) can be used. Note that the Optimize versions of your backup snapshots must match the currently running version of Optimize. You can identify the version at which the backup was taken by the version tag included in respective snapshot names; for example, a snapshot with the name`camunda_optimize_123456_3.9.0_part_1_of_2` was taken of Optimize version `3.9.0`. +There is no Optimize API to perform the backup restore. Instead, the standard [Elasticsearch](https://www.elastic.co/guide/en/elasticsearch/reference/current/restore-snapshot-api.html) / [OpenSearch](https://opensearch.org/docs/latest/api-reference/snapshots/restore-snapshot) restore snapshot API can be used. Note that the Optimize versions of your backup snapshots must match the currently running version of Optimize. You can identify the version at which the backup was taken by the version tag included in respective snapshot names; for example, a snapshot with the name`camunda_optimize_123456_3.9.0_part_1_of_2` was taken of Optimize version `3.9.0`. :::note Optimize must NOT be running while a backup is being restored. ::: -To restore an existing backup, all the snapshots this backup contains (as listed in the response of the [create backup API request](#example-response)) must be restored using the Elasticsearch API. +To restore an existing backup, all the snapshots this backup contains (as listed in the response of the [create backup API request](#example-response)) must be restored using the restore API. To restore a given backup, the following steps must be performed: 1. Stop Optimize. -2. Ensure no Optimize indices are present in Elasticsearch (or the restore process will fail). -3. Iterate over all Elasticsearch snapshots included in the desired backup and restore them using the Elasticsearch restore snapshot API. +2. Ensure no Optimize indices are present in the database (or the restore process will fail). +3. Iterate over all ElasticSearch/OpenSearch snapshots included in the desired backup and restore them using the restore snapshot API mentioned above. 4. Start Optimize. -Example Elasticsearch request: +Example request: ```shell curl --request POST `http://localhost:9200/_snapshot/repository_name/camunda_optimize_123456_3.9.0_part_1_of_2/_restore?wait_for_completion=true` diff --git a/optimize/self-managed/optimize-deployment/advanced-features/import-guide.md b/optimize/self-managed/optimize-deployment/advanced-features/import-guide.md index f23cda4b3b0..47ae55163dd 100644 --- a/optimize/self-managed/optimize-deployment/advanced-features/import-guide.md +++ b/optimize/self-managed/optimize-deployment/advanced-features/import-guide.md @@ -191,15 +191,19 @@ Therefore, the Optimize `ProcessInstance` is an aggregation of the engine's hist :::note Optimize uses [nested documents](https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html), the above mentioned data is an example of documents that are nested within Optimize's `ProcessInstance` index. -Elasticsearch applies restrictions regarding how many objects can be nested within one document. If your data includes too many nested documents, you may experience import failures. To avoid this, you can temporarily increase the nested object limit in Optimize's [index configuration](./../configuration/system-configuration.md#index-settings). Note that this might cause memory errors. +Elasticsearch and OpenSearch apply restrictions regarding how many objects can be nested within one document. If your data includes too many nested documents, you may experience import failures. To avoid this, you can temporarily increase the nested object limit in Optimize's [index configuration](./../configuration/system-configuration.md#index-settings). Note that this might cause memory errors. ::: Import executions per engine entity are actually independent from another. Each follows a [producer-consumer-pattern](https://dzone.com/articles/producer-consumer-pattern), where the type specific `ImportService` is the single producer and a dedicated single `ImportJobExecutor` is the consumer of its import jobs, decoupled by a queue. So, both are executed in different threads. To adjust the processing speed of the executor, the queue size and the number of threads that process the import jobs can be configured: +:::note +Although the parameters below include `ElasticSearch` in their name, they apply to both ElasticSearch and OpenSearch installations. For backward compatibility reasons, the parameters have not been renamed. +::: + ```yaml import: # Number of threads being used to process the import jobs per data type that are writing - # data to elasticsearch. + # data to the database. elasticsearchJobExecutorThreadCount: 1 # Adjust the queue size of the import jobs per data type that store data to elasticsearch. # A too large value might cause memory problems. diff --git a/optimize/self-managed/optimize-deployment/configuration/shared-elasticsearch-cluster.md b/optimize/self-managed/optimize-deployment/configuration/shared-elasticsearch-cluster.md index 70253979927..bb01105678e 100644 --- a/optimize/self-managed/optimize-deployment/configuration/shared-elasticsearch-cluster.md +++ b/optimize/self-managed/optimize-deployment/configuration/shared-elasticsearch-cluster.md @@ -18,10 +18,6 @@ The following illustration demonstrates this use case with two Optimize instance Changing the value of `*.settings.index.prefix` after an instance was already running results in new indexes being created with the new prefix value. There is no support in migrating data between indexes based on different prefixes. ::: -:::note -Not all Optimize features are supported when using OpenSearch as a database. For a full list of the features that are currently supported, please refer to the [Camunda 7](https://github.com/camunda/issues/issues/705) and [Camunda 8](https://github.com/camunda/issues/issues/635) OpenSearch features. -::: - \* Elasticsearch index prefix settings path: `es.settings.index.prefix`
\* OpenSearch index prefix settings path: `opensearch.settings.index.prefix` ![Shared Elasticsearch Cluster Setup](img/shared-elasticsearch-cluster.png) diff --git a/optimize/self-managed/optimize-deployment/configuration/system-configuration-platform-7.md b/optimize/self-managed/optimize-deployment/configuration/system-configuration-platform-7.md index e5491a96aea..30c9a31ac0f 100644 --- a/optimize/self-managed/optimize-deployment/configuration/system-configuration-platform-7.md +++ b/optimize/self-managed/optimize-deployment/configuration/system-configuration-platform-7.md @@ -64,8 +64,8 @@ REST API endpoint locations, timeouts, etc. | import.data.user-task-worker.metadata.maxPageSize | 10000 | The max page size when multiple users or groups are iterated during the metadata refresh. | | import.data.user-task-worker.metadata.maxEntryLimit | 100000 | The entry limit of the cache that holds the metadata, if you need more entries you can increase that limit. When increasing the limit, keep in mind to account for that by increasing the JVM heap memory as well. Please refer to the "Adjust Optimize heap size" documentation. | | import.skipDataAfterNestedDocLimitReached | false | Some data can no longer be imported to a given document if its number of nested documents has reached the configured limit. Enable this setting to skip this data during import if the nested document limit has been reached. | -| import.elasticsearchJobExecutorThreadCount | 1 | Number of threads being used to process the import jobs per data type that are writing data to elasticsearch. | -| import.elasticsearchJobExecutorQueueSize | 5 | Adjust the queue size of the import jobs per data type that store data to elasticsearch. If the value is too large it might cause memory problems. | +| import.elasticsearchJobExecutorThreadCount\* | 1 | Number of threads being used to process the import jobs per data type that are writing data to elasticsearch. | +| import.elasticsearchJobExecutorQueueSize\* | 5 | Adjust the queue size of the import jobs per data type that store data to elasticsearch. If the value is too large it might cause memory problems. | | import.handler.backoff.interval | 5000 | Interval in milliseconds which is used for the backoff time calculation. | | import.handler.backoff.max | 15 | Once all pages are consumed, the import scheduler component will start scheduling fetching tasks in increasing periods of time, controlled by "backoff" counter. | | import.handler.backoff.isEnabled | true | Tells if the backoff is enabled of not. | @@ -77,3 +77,5 @@ REST API endpoint locations, timeouts, etc. | import.identitySync.cronTrigger | `0 */2 * * *` | Cron expression for when the identity sync should run, defaults to every second hour. You can either use the default Cron (5 fields) or the Spring Cron (6 fields) expression format here.

For details on the format please refer to: | | import.identitySync.maxPageSize | 10000 | The max page size when multiple users or groups are iterated during the import. | | import.identitySync.maxEntryLimit | 100000 | The entry limit of the user/group search cache. When increasing the limit, keep in mind to account for this by increasing the JVM heap memory as well. Please refer to the "Adjust Optimize heap size" documentation on how to configure the heap size. | + +\* Although this parameter includes `ElasticSearch` in its name, it applies to both ElasticSearch and OpenSearch installations. For backward compatibility reasons, the parameter has not been renamed. diff --git a/optimize/self-managed/optimize-deployment/configuration/system-configuration.md b/optimize/self-managed/optimize-deployment/configuration/system-configuration.md index 22b7c67c0e9..97342e02234 100644 --- a/optimize/self-managed/optimize-deployment/configuration/system-configuration.md +++ b/optimize/self-managed/optimize-deployment/configuration/system-configuration.md @@ -186,10 +186,6 @@ Define a secured connection to be able to communicate with a secured Elasticsear These settings are only relevant when operating Optimize with OpenSearch. -:::note -Not all Optimize features are supported when using OpenSearch as a database. For a full list of the features that are currently supported, please refer to the [Camunda 7](https://github.com/camunda/issues/issues/705) and [Camunda 8](https://github.com/camunda/issues/issues/635) OpenSearch features. -::: - #### Connection settings This section details everything related to building the connection to OpenSearch. @@ -236,10 +232,6 @@ Define a secured connection to be able to communicate with a secured OpenSearch | -------------------------------- | ------------- | ------------------------------------------------------------------------ | | opensearch.backup.repositoryName | "" | The name of the snapshot repository to be used to back up Optimize data. | -:::note -The backup functionality is not yet supported for OpenSearch. -::: - ### Email Settings for the email server to send email notifications, e.g. when an alert is triggered. diff --git a/optimize/self-managed/optimize-deployment/install-and-start.md b/optimize/self-managed/optimize-deployment/install-and-start.md index 2c99ebde0ef..a514fe0b8bf 100644 --- a/optimize/self-managed/optimize-deployment/install-and-start.md +++ b/optimize/self-managed/optimize-deployment/install-and-start.md @@ -89,10 +89,6 @@ After that, [configure the database connection](./configuration/getting-started. #### Getting started with the Optimize Docker image -:::note -Not all Optimize features are supported when using OpenSearch as a database. For a full list of the features that are currently supported, please refer to the [Camunda 7](https://github.com/camunda/issues/issues/705) and [Camunda 8](https://github.com/camunda/issues/issues/635) OpenSearch features. -::: - ##### Full local setup To start the Optimize Docker image and connect to an already locally running Camunda 7 as well as Elasticsearch instance you could run the following command: diff --git a/optimize_versioned_docs/version-3.14.0/self-managed/optimize-deployment/advanced-features/import-guide.md b/optimize_versioned_docs/version-3.14.0/self-managed/optimize-deployment/advanced-features/import-guide.md index f23cda4b3b0..47ae55163dd 100644 --- a/optimize_versioned_docs/version-3.14.0/self-managed/optimize-deployment/advanced-features/import-guide.md +++ b/optimize_versioned_docs/version-3.14.0/self-managed/optimize-deployment/advanced-features/import-guide.md @@ -191,15 +191,19 @@ Therefore, the Optimize `ProcessInstance` is an aggregation of the engine's hist :::note Optimize uses [nested documents](https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html), the above mentioned data is an example of documents that are nested within Optimize's `ProcessInstance` index. -Elasticsearch applies restrictions regarding how many objects can be nested within one document. If your data includes too many nested documents, you may experience import failures. To avoid this, you can temporarily increase the nested object limit in Optimize's [index configuration](./../configuration/system-configuration.md#index-settings). Note that this might cause memory errors. +Elasticsearch and OpenSearch apply restrictions regarding how many objects can be nested within one document. If your data includes too many nested documents, you may experience import failures. To avoid this, you can temporarily increase the nested object limit in Optimize's [index configuration](./../configuration/system-configuration.md#index-settings). Note that this might cause memory errors. ::: Import executions per engine entity are actually independent from another. Each follows a [producer-consumer-pattern](https://dzone.com/articles/producer-consumer-pattern), where the type specific `ImportService` is the single producer and a dedicated single `ImportJobExecutor` is the consumer of its import jobs, decoupled by a queue. So, both are executed in different threads. To adjust the processing speed of the executor, the queue size and the number of threads that process the import jobs can be configured: +:::note +Although the parameters below include `ElasticSearch` in their name, they apply to both ElasticSearch and OpenSearch installations. For backward compatibility reasons, the parameters have not been renamed. +::: + ```yaml import: # Number of threads being used to process the import jobs per data type that are writing - # data to elasticsearch. + # data to the database. elasticsearchJobExecutorThreadCount: 1 # Adjust the queue size of the import jobs per data type that store data to elasticsearch. # A too large value might cause memory problems. From 7aaea622b52ddc3ccbc5d58d2425cfe3f90b7bff Mon Sep 17 00:00:00 2001 From: Giuliano Rodrigues Lima <91879848+grlimacan@users.noreply.github.com> Date: Wed, 11 Dec 2024 17:08:21 +0200 Subject: [PATCH 2/3] chore: Implementing suggestions from review --- .../advanced-features/import-guide.md | 20 ++++++------- .../system-configuration-platform-7.md | 6 ++-- .../advanced-features/import-guide.md | 28 +++++++++---------- 3 files changed, 27 insertions(+), 27 deletions(-) diff --git a/optimize/self-managed/optimize-deployment/advanced-features/import-guide.md b/optimize/self-managed/optimize-deployment/advanced-features/import-guide.md index 47ae55163dd..451ba66436d 100644 --- a/optimize/self-managed/optimize-deployment/advanced-features/import-guide.md +++ b/optimize/self-managed/optimize-deployment/advanced-features/import-guide.md @@ -14,17 +14,17 @@ In general, the import assumes the following setup: - A Camunda engine from which Optimize imports the data. - The Optimize backend, where the data is transformed into an appropriate format for efficient data analysis. -- [Elasticsearch](https://www.elastic.co/guide/index.html), which is the database Optimize persists all formatted data to. +- [Elasticsearch (ES)](https://www.elastic.co/guide/index.html) or [OpenSearch (OS)](https://opensearch.org/), which serves as the database that Optimize uses to persist all of its formatted data. The following depicts the setup and how the components communicate with each other: ![Optimize Import Structure](img/Optimize-Structure.png) -Optimize queries the engine data using a dedicated Optimize REST-API within the engine, transforms the data, and stores it in its own Elasticsearch database such that it can be quickly and easily queried by Optimize when evaluating reports or performing analyses. The reason for having a dedicated REST endpoint for Optimize is performance: the default REST-API adds a lot of complexity to retrieve the data from the engine database, which can result in low performance for large data sets. +Optimize queries the engine data using a dedicated Optimize REST-API within the engine, transforms the data, and stores it in its own database such that it can be quickly and easily queried by Optimize when evaluating reports or performing analyses. The reason for having a dedicated REST endpoint for Optimize is performance: the default REST-API adds a lot of complexity to retrieve the data from the engine database, which can result in low performance for large data sets. Note the following limitations regarding the data in Optimize's database: -- The data is only a near real-time representation of the engine database. This means Elasticsearch may not contain the data of the most recent time frame, e.g. the last two minutes, but all the previous data should be synchronized. +- The data is only a near real-time representation of the engine database. This means the database may not contain the data of the most recent time frame, e.g. the last two minutes, but all the previous data should be synchronized. - Optimize only imports the data it needs for its analysis. The rest is omitted and won't be available for further investigation. Currently, Optimize imports: - The history of the activity instances - The history of the process instances @@ -47,7 +47,7 @@ This section gives an overview of how fast Optimize imports certain data sets. T It is very likely that these metrics change for different data sets because the speed of the import depends on how the data is distributed. -The import is also affected by how the involved components are set up. For instance, if you deploy the Camunda engine on a different machine than Optimize and Elasticsearch to provide both applications with more computation resources, the process is likely to speed up. If the Camunda engine and Optimize are physically far away from each other, the network latency might slow down the import. +The import is also affected by how the involved components are set up. For instance, if you deploy the Camunda engine on a different machine than Optimize and Elasticsearch/OpenSearch to provide both applications with more computation resources, the process is likely to speed up. If the Camunda engine and Optimize are physically far away from each other, the network latency might slow down the import. ### Setup @@ -135,7 +135,7 @@ During execution, the following steps are performed: 2. Map entities and add an import job 3. [Execute the import](#execute-the-import). 1. Poll a job - 2. Persist the new entities to Elasticsearch + 2. Persist the new entities to the database ### Start an import round @@ -175,21 +175,21 @@ First, the `ImportScheduler` retrieves the newest index, which identifies the la #### Map entities and add an import job -All fetched entities are mapped to a representation that allows Optimize to query the data very quickly. Subsequently, an import job is created and added to the queue to persist the data in Elasticsearch. +All fetched entities are mapped to a representation that allows Optimize to query the data very quickly. Subsequently, an import job is created and added to the queue to persist the data in the database. ### Execute the import Full aggregation of the data is performed by a dedicated `ImportJobExecutor` for each entity type, which waits for `ImportJob` instances to be added to the execution queue. As soon as a job is in the queue, the executor: - Polls the job with the new Optimize entities -- Persists the new entities to Elasticsearch +- Persists the new entities to the database The data from the engine and Optimize do not have a one-to-one relationship, i.e., one entity type in Optimize may consist of data aggregated from different data types of the engine. For example, the historic process instance is first mapped to an Optimize `ProcessInstance`. However, for the heatmap analysis it is also necessary for `ProcessInstance` to contain all activities that were executed in the process instance. -Therefore, the Optimize `ProcessInstance` is an aggregation of the engine's historic process instance and other related data: historic activity instance data, user task data, and variable data are all [nested documents](https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html) within Optimize's `ProcessInstance` representation. +Therefore, the Optimize `ProcessInstance` is an aggregation of the engine's historic process instance and other related data: historic activity instance data, user task data, and variable data are all nested documents ([ES](https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html) / [OS](https://opensearch.org/docs/latest/field-types/supported-field-types/nested/)) within Optimize's `ProcessInstance` representation. :::note -Optimize uses [nested documents](https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html), the above mentioned data is an example of documents that are nested within Optimize's `ProcessInstance` index. +Optimize uses nested documents ([ES](https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html) / [OS](https://opensearch.org/docs/latest/field-types/supported-field-types/nested/)), the above mentioned data is an example of documents that are nested within Optimize's `ProcessInstance` index. Elasticsearch and OpenSearch apply restrictions regarding how many objects can be nested within one document. If your data includes too many nested documents, you may experience import failures. To avoid this, you can temporarily increase the nested object limit in Optimize's [index configuration](./../configuration/system-configuration.md#index-settings). Note that this might cause memory errors. ::: @@ -205,7 +205,7 @@ import: # Number of threads being used to process the import jobs per data type that are writing # data to the database. elasticsearchJobExecutorThreadCount: 1 - # Adjust the queue size of the import jobs per data type that store data to elasticsearch. + # Adjust the queue size of the import jobs per data type that store data to the database. # A too large value might cause memory problems. elasticsearchJobExecutorQueueSize: 5 ``` diff --git a/optimize/self-managed/optimize-deployment/configuration/system-configuration-platform-7.md b/optimize/self-managed/optimize-deployment/configuration/system-configuration-platform-7.md index 30c9a31ac0f..dd3eddf077f 100644 --- a/optimize/self-managed/optimize-deployment/configuration/system-configuration-platform-7.md +++ b/optimize/self-managed/optimize-deployment/configuration/system-configuration-platform-7.md @@ -64,13 +64,13 @@ REST API endpoint locations, timeouts, etc. | import.data.user-task-worker.metadata.maxPageSize | 10000 | The max page size when multiple users or groups are iterated during the metadata refresh. | | import.data.user-task-worker.metadata.maxEntryLimit | 100000 | The entry limit of the cache that holds the metadata, if you need more entries you can increase that limit. When increasing the limit, keep in mind to account for that by increasing the JVM heap memory as well. Please refer to the "Adjust Optimize heap size" documentation. | | import.skipDataAfterNestedDocLimitReached | false | Some data can no longer be imported to a given document if its number of nested documents has reached the configured limit. Enable this setting to skip this data during import if the nested document limit has been reached. | -| import.elasticsearchJobExecutorThreadCount\* | 1 | Number of threads being used to process the import jobs per data type that are writing data to elasticsearch. | -| import.elasticsearchJobExecutorQueueSize\* | 5 | Adjust the queue size of the import jobs per data type that store data to elasticsearch. If the value is too large it might cause memory problems. | +| import.elasticsearchJobExecutorThreadCount\* | 1 | Number of threads being used to process the import jobs per data type that are writing data to the database. | +| import.elasticsearchJobExecutorQueueSize\* | 5 | Adjust the queue size of the import jobs per data type that store data to the database. If the value is too large it might cause memory problems. | | import.handler.backoff.interval | 5000 | Interval in milliseconds which is used for the backoff time calculation. | | import.handler.backoff.max | 15 | Once all pages are consumed, the import scheduler component will start scheduling fetching tasks in increasing periods of time, controlled by "backoff" counter. | | import.handler.backoff.isEnabled | true | Tells if the backoff is enabled of not. | | import.indexType | import-index | The name of the import index type. | -| import.importIndexStorageIntervalInSec | 10 | States how often the import index should be stored to Elasticsearch. | +| import.importIndexStorageIntervalInSec | 10 | States how often the import index should be stored to the database. | | import.currentTimeBackoffMilliseconds | 300000 | This is the time interval the import backs off from the current tip of the time during the ongoing import cycle. This ensures that potentially missed concurrent writes in the engine are reread going back by the amount of this time interval. | | import.identitySync.includeUserMetaData | true | Whether to include metaData (firstName, lastName, email) when synchronizing users. If disabled only user IDs will be shown on user search and in collection permissions. | | import.identitySync.collectionRoleCleanupEnabled | false | Whether collection role cleanup should be performed. If enabled, users that no longer exist in the identity provider will be automatically removed from collection permissions. | diff --git a/optimize_versioned_docs/version-3.14.0/self-managed/optimize-deployment/advanced-features/import-guide.md b/optimize_versioned_docs/version-3.14.0/self-managed/optimize-deployment/advanced-features/import-guide.md index 47ae55163dd..0523b3f35af 100644 --- a/optimize_versioned_docs/version-3.14.0/self-managed/optimize-deployment/advanced-features/import-guide.md +++ b/optimize_versioned_docs/version-3.14.0/self-managed/optimize-deployment/advanced-features/import-guide.md @@ -14,17 +14,17 @@ In general, the import assumes the following setup: - A Camunda engine from which Optimize imports the data. - The Optimize backend, where the data is transformed into an appropriate format for efficient data analysis. -- [Elasticsearch](https://www.elastic.co/guide/index.html), which is the database Optimize persists all formatted data to. +- [Elasticsearch (ES)](https://www.elastic.co/guide/index.html) or [OpenSearch (OS)](https://opensearch.org/), which serves as the database that Optimize uses to persist all of its formatted data. The following depicts the setup and how the components communicate with each other: ![Optimize Import Structure](img/Optimize-Structure.png) -Optimize queries the engine data using a dedicated Optimize REST-API within the engine, transforms the data, and stores it in its own Elasticsearch database such that it can be quickly and easily queried by Optimize when evaluating reports or performing analyses. The reason for having a dedicated REST endpoint for Optimize is performance: the default REST-API adds a lot of complexity to retrieve the data from the engine database, which can result in low performance for large data sets. +Optimize queries the engine data using a dedicated Optimize REST-API within the engine, transforms the data, and stores it in its own database such that it can be quickly and easily queried by Optimize when evaluating reports or performing analyses. The reason for having a dedicated REST endpoint for Optimize is performance: the default REST-API adds a lot of complexity to retrieve the data from the engine database, which can result in low performance for large data sets. Note the following limitations regarding the data in Optimize's database: -- The data is only a near real-time representation of the engine database. This means Elasticsearch may not contain the data of the most recent time frame, e.g. the last two minutes, but all the previous data should be synchronized. +- The data is only a near real-time representation of the engine database. This means the database may not contain the data of the most recent time frame, e.g. the last two minutes, but all the previous data should be synchronized. - Optimize only imports the data it needs for its analysis. The rest is omitted and won't be available for further investigation. Currently, Optimize imports: - The history of the activity instances - The history of the process instances @@ -47,7 +47,7 @@ This section gives an overview of how fast Optimize imports certain data sets. T It is very likely that these metrics change for different data sets because the speed of the import depends on how the data is distributed. -The import is also affected by how the involved components are set up. For instance, if you deploy the Camunda engine on a different machine than Optimize and Elasticsearch to provide both applications with more computation resources, the process is likely to speed up. If the Camunda engine and Optimize are physically far away from each other, the network latency might slow down the import. +The import is also affected by how the involved components are set up. For instance, if you deploy the Camunda engine on a different machine than Optimize and Elasticsearch/OpenSearch to provide both applications with more computation resources, the process is likely to speed up. If the Camunda engine and Optimize are physically far away from each other, the network latency might slow down the import. ### Setup @@ -131,11 +131,11 @@ During execution, the following steps are performed: 1. [Start an import round](#start-an-import-round). 2. [Prepare the import](#prepare-the-import). - 1. Poll a new page - 2. Map entities and add an import job -3. [Execute the import](#execute-the-import). - 1. Poll a job - 2. Persist the new entities to Elasticsearch +3. Poll a new page +4. Map entities and add an import job +5. [Execute the import](#execute-the-import). +6. Poll a job +7. Persist the new entities to the database ### Start an import round @@ -175,21 +175,21 @@ First, the `ImportScheduler` retrieves the newest index, which identifies the la #### Map entities and add an import job -All fetched entities are mapped to a representation that allows Optimize to query the data very quickly. Subsequently, an import job is created and added to the queue to persist the data in Elasticsearch. +All fetched entities are mapped to a representation that allows Optimize to query the data very quickly. Subsequently, an import job is created and added to the queue to persist the data in the database. ### Execute the import Full aggregation of the data is performed by a dedicated `ImportJobExecutor` for each entity type, which waits for `ImportJob` instances to be added to the execution queue. As soon as a job is in the queue, the executor: - Polls the job with the new Optimize entities -- Persists the new entities to Elasticsearch +- Persists the new entities to the database The data from the engine and Optimize do not have a one-to-one relationship, i.e., one entity type in Optimize may consist of data aggregated from different data types of the engine. For example, the historic process instance is first mapped to an Optimize `ProcessInstance`. However, for the heatmap analysis it is also necessary for `ProcessInstance` to contain all activities that were executed in the process instance. -Therefore, the Optimize `ProcessInstance` is an aggregation of the engine's historic process instance and other related data: historic activity instance data, user task data, and variable data are all [nested documents](https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html) within Optimize's `ProcessInstance` representation. +Therefore, the Optimize `ProcessInstance` is an aggregation of the engine's historic process instance and other related data: historic activity instance data, user task data, and variable data are all nested documents ([ES](https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html) / [OS](https://opensearch.org/docs/latest/field-types/supported-field-types/nested/)) within Optimize's `ProcessInstance` representation. :::note -Optimize uses [nested documents](https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html), the above mentioned data is an example of documents that are nested within Optimize's `ProcessInstance` index. +Optimize uses nested documents ([ES](https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html) / [OS](https://opensearch.org/docs/latest/field-types/supported-field-types/nested/)), the above mentioned data is an example of documents that are nested within Optimize's `ProcessInstance` index. Elasticsearch and OpenSearch apply restrictions regarding how many objects can be nested within one document. If your data includes too many nested documents, you may experience import failures. To avoid this, you can temporarily increase the nested object limit in Optimize's [index configuration](./../configuration/system-configuration.md#index-settings). Note that this might cause memory errors. ::: @@ -205,7 +205,7 @@ import: # Number of threads being used to process the import jobs per data type that are writing # data to the database. elasticsearchJobExecutorThreadCount: 1 - # Adjust the queue size of the import jobs per data type that store data to elasticsearch. + # Adjust the queue size of the import jobs per data type that store data to the database. # A too large value might cause memory problems. elasticsearchJobExecutorQueueSize: 5 ``` From c9fe9017aa3653a2d51953e8ac0d6101d0cb4593 Mon Sep 17 00:00:00 2001 From: Giuliano Rodrigues Lima <91879848+grlimacan@users.noreply.github.com> Date: Wed, 11 Dec 2024 17:30:45 +0200 Subject: [PATCH 3/3] chore: Implementing suggestions from review --- .../advanced-features/import-guide.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/optimize_versioned_docs/version-3.14.0/self-managed/optimize-deployment/advanced-features/import-guide.md b/optimize_versioned_docs/version-3.14.0/self-managed/optimize-deployment/advanced-features/import-guide.md index 0523b3f35af..451ba66436d 100644 --- a/optimize_versioned_docs/version-3.14.0/self-managed/optimize-deployment/advanced-features/import-guide.md +++ b/optimize_versioned_docs/version-3.14.0/self-managed/optimize-deployment/advanced-features/import-guide.md @@ -131,11 +131,11 @@ During execution, the following steps are performed: 1. [Start an import round](#start-an-import-round). 2. [Prepare the import](#prepare-the-import). -3. Poll a new page -4. Map entities and add an import job -5. [Execute the import](#execute-the-import). -6. Poll a job -7. Persist the new entities to the database + 1. Poll a new page + 2. Map entities and add an import job +3. [Execute the import](#execute-the-import). + 1. Poll a job + 2. Persist the new entities to the database ### Start an import round