Skip to content

Commit

Permalink
Merge pull request #6151 from dimagi/ap/es/reindex-changelog
Browse files Browse the repository at this point in the history
Changelog for Instructions to Reindex Indexes
  • Loading branch information
AmitPhulera authored Oct 27, 2023
2 parents 7c352d1 + c144c6a commit 5f65e4b
Show file tree
Hide file tree
Showing 3 changed files with 188 additions and 0 deletions.
87 changes: 87 additions & 0 deletions changelog/0075-reindex-all-indexes-for-es-upgrade.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
title: Reindex All Indices For Elasticsearch Upgrade
key: reindex-all-indices-for-es-upgrade
date: 2023-10-25
optional_per_env: no
# (optional) Min version of HQ that MUST be deployed before this change can be rolled out (commit hash)
min_commcare_version: 045d45fa69b6dd83f870ed185c0d73adeef14350
max_commcare_version:
context: |
Reindex Elasticsearch Indices for upcoming ES Upgrade.
details: |
Currently CommCare HQ is running with Elasticsearch 2.4. Reindexing is required before upgrading Elasticsearch to version 5.6.16.
update_steps: |
1. Deploy latest version of CommCare HQ.
2. Ensure that there is enough free space available in your cluster. The command to estimate disk space required for reindexing is:
```sh
cchq <env> django-manage elastic_sync_multiplexed estimated_size_for_reindex
```
3. Check disk usage on each node
- If you have separate data nodes, check disk usage on data nodes
```sh
cchq <env> run-shell-command es_data "df -h /opt/data" -b
```
- If you don't have separate data nodes, check disk usage on ES nodes.
```sh
cchq <env> run-shell-command elasticsearch "df -h /opt/data" -b
```
This will return disk usage for each node. You can check if the cumulative available space across all nodes is greater than the total recommended space from the `estimated_size_for_reindex` output.
If you don't have enough space available in your elasticsearch cluster, it is recommended to add more storage capacity before reindexing. If you don't have the option to add more storage, you will need to reindex one index at a time. You should follow the process described in [Reindexing One Index at A Time](https://github.com/dimagi/commcare-hq/blob/56682492f20c60cdef0ccde6049b9945b3658493/corehq/apps/es/REINDEX_PROCESS.md#reindexing-one-index-at-a-time)
4. It is advised to run the following steps in a tmux session as they might take a long time and can be detached/re-attached as needed for monitoring progress.
- To start a tmux session in your django-manage machine, use the following steps, changing `<env>` to your environment name where needed:
```bash
cchq <env> tmux django_manage
sudo -iu cchq
cd /home/cchq/www/<env>/current
source python_env/bin/activate
```
5. The following steps should be performed serially for each canonical index name, replacing `${INDEX_CNAME}` with each of the following values `['apps', 'cases', 'case_search', 'domains', 'forms', 'groups', 'sms', 'users']`
1. Set the `${INDEX_CNAME}` environment variable. For example, the first run should be:
```bash
INDEX_CNAME='apps'
```
2. Start the reindex process
```bash
./manage.py elastic_sync_multiplexed start ${INDEX_CNAME}
```
Note down the Task Number that is displayed by the command. It should be a numeric ID and will be required to verify the reindex process.
3. Verify the reindex is completed by querying the logs and ensuring that the doc count matches between primary and secondary indices. The commands in this step should be run from the control machine.
1. Login to the control machine in a new shell.
```
cchq <env> ssh control
```
2. From control machine, run the following command to query the reindex logs.
```
cchq <env> run-shell-command elasticsearch "grep '<Task Number>.*ReindexResponse' /opt/data/elasticsearch*/logs/*.log"
```
This command will query the Elasticsearch nodes to find any log entries containing the `ReindexResponse` for the given Task Number. The log should look something like:
```
[2023-10-25 08:59:37,648][INFO] [tasks] 29216 finished with response ReindexResponse[took=1.8s,updated=0,created=1111,batches=2,versionConflicts=0,noops=0,retries=0,throttledUntil=0s,indexing_failures=[],search_failures=[]]
```
Ensure that `search_failures` and `indexing_failures` are empty lists.
3. Then verify doc counts match between primary and secondary indices using:
```
cchq <env> django-manage elastic_sync_multiplexed display_doc_counts <index_cname>
```
This command will display the document counts for both the primary and secondary indices for a given index. If the doc count matches between the two and there are no errors in the reindex logs, then reindexing is complete for that index.
Please note that the counts may not match perfectly for high frequency indices like case_search, cases, and forms. In such cases, ensure the difference in counts is small (within one hundred) and there are no errors in reindex logs.
If reindex fails for any index, please refer to the docs [here](https://github.com/dimagi/commcare-hq/blob/56682492f20c60cdef0ccde6049b9945b3658493/corehq/apps/es/REINDEX_PROCESS.md#common-issues-resolutions-during-reindex) for troubleshooting steps.
4. If doc counts match and there are no errors present in the reindex logs, the reindex for the current index is complete. You can continue reindexing for the next index by repeating steps 5.1-5.3 with `${INDEX_CNAME}` set to 'cases', 'case_search' and so on for all indices.
96 changes: 96 additions & 0 deletions docs/source/changelog/0075-reindex-all-indexes-for-es-upgrade.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
<!--THIS FILE IS AUTOGENERATED: DO NOT EDIT-->
<!--See https://github.com/dimagi/commcare-cloud/blob/master/changelog/README.md for instructions-->
# 75. Reindex All Indices For Elasticsearch Upgrade

**Date:** 2023-10-25

**Optional per env:** _required on all environments_


## CommCare Version Dependency
The following version of CommCare must be deployed before rolling out this change:
[045d45fa](https://github.com/dimagi/commcare-hq/commit/045d45fa69b6dd83f870ed185c0d73adeef14350)


## Change Context
Reindex Elasticsearch Indices for upcoming ES Upgrade.

## Details
Currently CommCare HQ is running with Elasticsearch 2.4. Reindexing is required before upgrading Elasticsearch to version 5.6.16.

## Steps to update
1. Deploy latest version of CommCare HQ.
2. Ensure that there is enough free space available in your cluster. The command to estimate disk space required for reindexing is:

```sh
cchq <env> django-manage elastic_sync_multiplexed estimated_size_for_reindex
```
3. Check disk usage on each node
- If you have separate data nodes, check disk usage on data nodes
```sh
cchq <env> run-shell-command es_data "df -h /opt/data" -b
```
- If you don't have separate data nodes, check disk usage on ES nodes.
```sh
cchq <env> run-shell-command elasticsearch "df -h /opt/data" -b
```
This will return disk usage for each node. You can check if the cumulative available space across all nodes is greater than the total recommended space from the `estimated_size_for_reindex` output.
If you don't have enough space available in your elasticsearch cluster, it is recommended to add more storage capacity before reindexing. If you don't have the option to add more storage, you will need to reindex one index at a time. You should follow the process described in [Reindexing One Index at A Time](https://github.com/dimagi/commcare-hq/blob/56682492f20c60cdef0ccde6049b9945b3658493/corehq/apps/es/REINDEX_PROCESS.md#reindexing-one-index-at-a-time)
4. It is advised to run the following steps in a tmux session as they might take a long time and can be detached/re-attached as needed for monitoring progress.
- To start a tmux session in your django-manage machine, use the following steps, changing `<env>` to your environment name where needed:
```bash
cchq <env> tmux django_manage
sudo -iu cchq
cd /home/cchq/www/<env>/current
source python_env/bin/activate
```
5. The following steps should be performed serially for each canonical index name, replacing `${INDEX_CNAME}` with each of the following values `['apps', 'cases', 'case_search', 'domains', 'forms', 'groups', 'sms', 'users']`
1. Set the `${INDEX_CNAME}` environment variable. For example, the first run should be:
```bash
INDEX_CNAME='apps'
```
2. Start the reindex process
```bash
./manage.py elastic_sync_multiplexed start ${INDEX_CNAME}
```
Note down the Task Number that is displayed by the command. It should be a numeric ID and will be required to verify the reindex process.
3. Verify the reindex is completed by querying the logs and ensuring that the doc count matches between primary and secondary indices. The commands in this step should be run from the control machine.
1. Login to the control machine in a new shell.
```
cchq <env> ssh control
```
2. From control machine, run the following command to query the reindex logs.
```
cchq <env> run-shell-command elasticsearch "grep '<Task Number>.*ReindexResponse' /opt/data/elasticsearch*/logs/*.log"
```
This command will query the Elasticsearch nodes to find any log entries containing the `ReindexResponse` for the given Task Number. The log should look something like:
```
[2023-10-25 08:59:37,648][INFO] [tasks] 29216 finished with response ReindexResponse[took=1.8s,updated=0,created=1111,batches=2,versionConflicts=0,noops=0,retries=0,throttledUntil=0s,indexing_failures=[],search_failures=[]]
```
Ensure that `search_failures` and `indexing_failures` are empty lists.
3. Then verify doc counts match between primary and secondary indices using:
```
cchq <env> django-manage elastic_sync_multiplexed display_doc_counts <index_cname>
```
This command will display the document counts for both the primary and secondary indices for a given index. If the doc count matches between the two and there are no errors in the reindex logs, then reindexing is complete for that index.
Please note that the counts may not match perfectly for high frequency indices like case_search, cases, and forms. In such cases, ensure the difference in counts is small (within one hundred) and there are no errors in reindex logs.
If reindex fails for any index, please refer to the docs [here](https://github.com/dimagi/commcare-hq/blob/56682492f20c60cdef0ccde6049b9945b3658493/corehq/apps/es/REINDEX_PROCESS.md#common-issues-resolutions-during-reindex) for troubleshooting steps.
4. If doc counts match and there are no errors present in the reindex logs, the reindex for the current index is complete. You can continue reindexing for the next index by repeating steps 5.1-5.3 with `${INDEX_CNAME}` set to 'cases', 'case_search' and so on for all indices.
5 changes: 5 additions & 0 deletions docs/source/changelog/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,11 @@ need to be applied on your environment to keep it up to date.

### Changelog

#### **2023-10-25** [Reindex All Indices For Elasticsearch Upgrade](0075-reindex-all-indexes-for-es-upgrade.md)
Reindex Elasticsearch Indices for upcoming ES Upgrade.


---
#### **2023-09-12** [update-to-python-3.9.18](0074-update-to-python-3.9.18.md)
Installs python 3.9.18 and build a new virutalenv for CommCare HQ

Expand Down

0 comments on commit 5f65e4b

Please sign in to comment.