-
-#### Create temporary Camunda 8 installation in the failover mode in the surviving region
-
-}
-desired={}
+current={}
+desired={}
/>
@@ -177,32 +147,25 @@ Due to the Zeebe data replication, no data has been lost.
#### Desired state
-You are creating a temporary Camunda 8 deployment within the same region, but different namespace, to recover the Zeebe cluster functionality. Using a different namespace allows for easier distinguishing between the normal Zeebe deployment and Zeebe failover deployment.
-
-The newly deployed Zeebe brokers will be running in the failover mode. This will restore the quorum and the Zeebe data processing. Additionally, the new failover brokers are configured to export the data to the surviving Elasticsearch instance and to the newly deployed failover Elasticsearch instance.
+You have removed the lost brokers from the Zeebe cluster. This will allow us to continue processing after the next step and ensure that the new brokers in the failback procedure will only join the cluster with our intervention.
#### How to get there
-In the case **Region 1** was lost: in the previously cloned repository [c8-multi-region](https://github.com/camunda/c8-multi-region), navigate to the folder [aws/dual-region/kubernetes/region0](https://github.com/camunda/c8-multi-region/blob/main/aws/dual-region/kubernetes/region0/). This contains the example Helm values yaml `camunda-values-failover.yml` containing the required overlay for the **failover** mode.
+You will port-forward the `Zeebe Gateway` in the surviving region to the local host to interact with the Gateway.
-In the case when your **Region 0** was lost, instead go to the folder [aws/dual-region/kubernetes/region1](https://github.com/camunda/c8-multi-region/blob/main/aws/dual-region/kubernetes/region1/) for the `camunda-values-failover.yml` file.
+The following alternatives to port-forwarding are possible:
-The chosen `camunda-values-failover.yml` requires adjustments before installing the Helm chart and the same has to be done for the base `camunda-values.yml` in `aws/dual-region/kubernetes`.
+- if it's exposed to the outside, one can skip port-forwarding and use the URL directly
+- one can [`exec`](https://kubernetes.io/docs/reference/kubectl/generated/kubectl_exec/) into an existing pod (such as Elasticsearch), and `curl` from there
+- or temporarily [`run`](https://kubernetes.io/docs/reference/kubectl/generated/kubectl_run/) an Ubuntu pod in the cluster to `curl` from there
-- `ZEEBE_BROKER_CLUSTER_INITIALCONTACTPOINTS`
-- `ZEEBE_BROKER_EXPORTERS_ELASTICSEARCHREGION0_ARGS_URL`
-- `ZEEBE_BROKER_EXPORTERS_ELASTICSEARCHREGION1_ARGS_URL`
+In our example, we went with port-forwarding to a local host, but other alternatives can also be used.
-1. The bash script [generate_zeebe_helm_values.sh](https://github.com/camunda/c8-multi-region/blob/main/aws/dual-region/scripts/generate_zeebe_helm_values.sh) in the repository folder `aws/dual-region/scripts/` helps generate those values. You only have to copy and replace them within the previously mentioned Helm values files. It will use the exported environment variables of the environment prerequisites for namespaces and regions. Additionally, you have to pass in whether your region 0 or 1 was lost.
+1. Use the [zbctl client](../../../apis-tools/cli-client/index.md) to retrieve list of remaining brokers
```bash
-./generate_zeebe_helm_values.sh failover
-
-# It will ask you to provide the following values
-# Enter the region that was lost, values can either be 0 or 1:
-## In our case we lost region 1, therefore input 1
-# Enter Zeebe cluster size (total number of Zeebe brokers in both Kubernetes clusters):
-## for a dual-region setup we recommend 8. Resulting in 4 brokers per region.
+kubectl --context $CLUSTER_SURVIVING port-forward services/$HELM_RELEASE_NAME-zeebe-gateway 26500:26500 -n $CAMUNDA_NAMESPACE_SURVIVING
+zbctl status --insecure --address localhost:26500
```
@@ -210,49 +173,56 @@ The chosen `camunda-values-failover.yml` requires adjustments before installing
```bash
-Please use the following to change the existing environment variable ZEEBE_BROKER_CLUSTER_INITIALCONTACTPOINTS in the failover Camunda Helm chart values file 'region0/camunda-values-failover.yml' and in the base Camunda Helm chart values file 'camunda-values.yml'. It's part of the 'zeebe.env' path.
-
-- name: ZEEBE_BROKER_CLUSTER_INITIALCONTACTPOINTS
- value: camunda-zeebe-0.camunda-zeebe.camunda-london.svc.cluster.local:26502,camunda-zeebe-0.camunda-zeebe.camunda-paris.svc.cluster.local:26502,camunda-zeebe-1.camunda-zeebe.camunda-london.svc.cluster.local:26502,camunda-zeebe-1.camunda-zeebe.camunda-paris.svc.cluster.local:26502,camunda-zeebe-2.camunda-zeebe.camunda-london.svc.cluster.local:26502,camunda-zeebe-2.camunda-zeebe.camunda-paris.svc.cluster.local:26502,camunda-zeebe-3.camunda-zeebe.camunda-london.svc.cluster.local:26502,camunda-zeebe-3.camunda-zeebe.camunda-paris.svc.cluster.local:26502
-
-Please use the following to change the existing environment variable ZEEBE_BROKER_EXPORTERS_ELASTICSEARCHREGION0_ARGS_URL in the failover Camunda Helm chart values file 'region0/camunda-values-failover.yml' and in the base Camunda Helm chart values file 'camunda-values.yml'. It's part of the 'zeebe.env' path.
-
-- name: ZEEBE_BROKER_EXPORTERS_ELASTICSEARCHREGION0_ARGS_URL
- value: http://camunda-elasticsearch-master-hl.camunda-london.svc.cluster.local:9200
-
-Please use the following to change the existing environment variable ZEEBE_BROKER_EXPORTERS_ELASTICSEARCHREGION1_ARGS_URL in the failover Camunda Helm chart values file 'region0/camunda-values-failover.yml' and in the base Camunda Helm chart values file 'camunda-values.yml'. It's part of the 'zeebe.env' path.
-
-- name: ZEEBE_BROKER_EXPORTERS_ELASTICSEARCHREGION1_ARGS_URL
- value: http://camunda-elasticsearch-master-hl.camunda-london-failover.svc.cluster.local:9200
+Cluster size: 8
+Partitions count: 8
+Replication factor: 4
+Gateway version: 8.6.0
+Brokers:
+ Broker 0 - camunda-zeebe-0.camunda-zeebe.camunda-london.svc:26501
+ Version: 8.6.0
+ Partition 1 : Leader, Healthy
+ Partition 6 : Follower, Healthy
+ Partition 7 : Follower, Healthy
+ Partition 8 : Follower, Healthy
+ Broker 2 - camunda-zeebe-1.camunda-zeebe.camunda-london.svc:26501
+ Version: 8.6.0
+ Partition 1 : Follower, Healthy
+ Partition 2 : Follower, Healthy
+ Partition 3 : Follower, Healthy
+ Partition 8 : Leader, Healthy
+ Broker 4 - camunda-zeebe-2.camunda-zeebe.camunda-london.svc:26501
+ Version: 8.6.0
+ Partition 2 : Follower, Healthy
+ Partition 3 : Leader, Healthy
+ Partition 4 : Follower, Healthy
+ Partition 5 : Follower, Healthy
+ Broker 6 - camunda-zeebe-3.camunda-zeebe.camunda-london.svc:26501
+ Version: 8.6.0
+ Partition 4 : Follower, Healthy
+ Partition 5 : Follower, Healthy
+ Partition 6 : Follower, Healthy
+ Partition 7 : Leader, Healthy
```
-2. As the script suggests, replace the environment variables within `camunda-values-failover.yml`.
-3. Repeat the adjustments for the base Helm values file `camunda-values.yml` in `aws/dual-region/kubernetes` with the same output for the mentioned environment variables.
-4. From the terminal context of `aws/dual-region/kubernetes`, execute the following:
+2. Portforward the service of the Zeebe Gateway for the [management REST API](../../zeebe-deployment/configuration/gateway.md#managementserver)
```bash
-helm install $HELM_RELEASE_NAME camunda/camunda-platform \
- --version $HELM_CHART_VERSION \
- --kube-context $CLUSTER_SURVIVING \
- --namespace $CAMUNDA_NAMESPACE_FAILOVER \
- -f camunda-values.yml \
- -f $REGION_SURVIVING/camunda-values-failover.yml
+kubectl --context $CLUSTER_SURVIVING port-forward services/$HELM_RELEASE_NAME-zeebe-gateway 9600:9600 -n $CAMUNDA_NAMESPACE_SURVIVING
```
-#### Verification
-
-The following command will show the deployed pods of the failover namespace.
-
-Only the minimal amount of brokers required to restore the quorum will be deployed in the failover installation. For example, if `clusterSize` is eight, two Zeebe brokers will be deployed in the failover installation instead of the normal four. This is expected.
+3. Based on the [Cluster Scaling APIs](../../zeebe-deployment/operations/cluster-scaling.md), send a request to the Zeebe Gateway to redistribute the load to the remaining brokers, thereby removing the lost brokers.
+ In our example, we have lost region 1 and with that our uneven brokers. This means we will have to redistribute to our existing even brokers.
```bash
-kubectl --context $CLUSTER_SURVIVING get pods -n $CAMUNDA_NAMESPACE_FAILOVER
+curl -XPOST 'http://localhost:9600/actuator/cluster/brokers?force=true' -H 'Content-Type: application/json' -d '["0", "2", "4", "6"]'
```
-Port-forwarding the Zeebe Gateway via `kubectl` and printing the topology should reveal that the **failover** brokers have joined the cluster.
+#### Verification
+
+Port-forwarding the Zeebe Gateway via `kubectl` and printing the topology should reveal that the cluster size has decreased to 4, partitions have been redistributed over the remaining brokers, and new leaders have been elected.
```bash
kubectl --context $CLUSTER_SURVIVING port-forward services/$HELM_RELEASE_NAME-zeebe-gateway 26500:26500 -n $CAMUNDA_NAMESPACE_SURVIVING
@@ -264,43 +234,31 @@ zbctl status --insecure --address localhost:26500
```bash
-Cluster size: 8
+Cluster size: 4
Partitions count: 8
-Replication factor: 4
-Gateway version: 8.5.0
+Replication factor: 2
+Gateway version: 8.6.0
Brokers:
Broker 0 - camunda-zeebe-0.camunda-zeebe.camunda-london.svc:26501
- Version: 8.5.0
+ Version: 8.6.0
Partition 1 : Leader, Healthy
- Partition 6 : Follower, Healthy
- Partition 7 : Follower, Healthy
- Partition 8 : Follower, Healthy
- Broker 1 - camunda-zeebe-0.camunda-zeebe.camunda-london-failover.svc:26501
- Version: 8.5.0
- Partition 1 : Follower, Healthy
- Partition 2 : Leader, Healthy
+ Partition 6 : Leader, Healthy
Partition 7 : Follower, Healthy
Partition 8 : Follower, Healthy
Broker 2 - camunda-zeebe-1.camunda-zeebe.camunda-london.svc:26501
- Version: 8.5.0
+ Version: 8.6.0
Partition 1 : Follower, Healthy
- Partition 2 : Follower, Healthy
+ Partition 2 : Leader, Healthy
Partition 3 : Follower, Healthy
Partition 8 : Leader, Healthy
Broker 4 - camunda-zeebe-2.camunda-zeebe.camunda-london.svc:26501
- Version: 8.5.0
+ Version: 8.6.0
Partition 2 : Follower, Healthy
- Partition 3 : Follower, Healthy
- Partition 4 : Follower, Healthy
- Partition 5 : Follower, Healthy
- Broker 5 - camunda-zeebe-1.camunda-zeebe.camunda-london-failover.svc:26501
- Version: 8.5.0
Partition 3 : Leader, Healthy
Partition 4 : Follower, Healthy
Partition 5 : Follower, Healthy
- Partition 6 : Leader, Healthy
Broker 6 - camunda-zeebe-3.camunda-zeebe.camunda-london.svc:26501
- Version: 8.5.0
+ Version: 8.6.0
Partition 4 : Leader, Healthy
Partition 5 : Leader, Healthy
Partition 6 : Follower, Healthy
@@ -310,16 +268,39 @@ Brokers:
+You can also use the Zeebe Gateway's REST API to ensure the scaling progress has been completed. For better readability of the output, it is recommended to use [jq](https://jqlang.github.io/jq/).
+
+```bash
+kubectl --context $CLUSTER_SURVIVING port-forward services/$HELM_RELEASE_NAME-zeebe-gateway 9600:9600 -n $CAMUNDA_NAMESPACE_SURVIVING
+curl -XGET 'http://localhost:9600/actuator/cluster' | jq .lastChange
+```
+
+
+ Example output
+
+
+```bash
+{
+ "id": 2,
+ "status": "COMPLETED",
+ "startedAt": "2024-08-23T11:33:08.355681311Z",
+ "completedAt": "2024-08-23T11:33:09.170531963Z"
+}
+```
+
+
+
+
-
+
-#### Configure Zeebe to export data to temporary Elasticsearch deployment
+#### Configure Zeebe to disable the Elastic exporter to the lost region
}
-desired={}
+current={}
+desired={}
/>
@@ -328,15 +309,9 @@ desired={
}
Zeebe is not yet be able to continue exporting data since the Zeebe brokers in the surviving region are configured to point to the Elasticsearch instance of the lost region.
-:::info
-
-Simply disabling the exporter would not be helpful here, since the sequence numbers in the exported data are not persistent when an exporter configuration is removed from Zeebe settings and added back later. The correct sequence numbers are required by Operate and Tasklist to import Elasticsearch data correctly.
-
-:::
-
#### Desired state
-You have reconfigured the existing Camunda deployment in `CAMUNDA_NAMESPACE_SURVIVING` to point Zeebe to the export data to the temporary Elasticsearch instance that was previously created in **Step 2**.
+You have disabled the Elasticsearch exporter to the failed region in the Zeebe cluster.
The Zeebe cluster is then unblocked and can export data to Elasticsearch again.
@@ -344,41 +319,42 @@ Completing this step will restore regular interaction with Camunda 8 for your us
#### How to get there
-In **Step 2** you have already adjusted the base Helm values file `camunda-values.yml` in `aws/dual-region/kubernetes` with the same changes as for the failover deployment for the environment variables.
+1. Portforward the service of the Zeebe Gateway for the [management REST API](../../zeebe-deployment/configuration/gateway.md#managementserver)
-- `ZEEBE_BROKER_CLUSTER_INITIALCONTACTPOINTS`
-- `ZEEBE_BROKER_EXPORTERS_ELASTICSEARCHREGION0_ARGS_URL`
-- `ZEEBE_BROKER_EXPORTERS_ELASTICSEARCHREGION1_ARGS_URL`
+```bash
+kubectl --context $CLUSTER_SURVIVING port-forward services/$HELM_RELEASE_NAME-zeebe-gateway 9600:9600 -n $CAMUNDA_NAMESPACE_SURVIVING
+```
-From the `aws/dual-region/kubernetes` directory, do a Helm upgrade to update the configuration of the Zeebe deployment in `CAMUNDA_NAMESPACE_SURVIVING` to point to the failover Elasticsearch instance:
+2. List all exporters to find the corresponding ID. Alternatively, you can check your `camunda-values.yml` file, which lists the exporters as those had to be configured explicitly.
```bash
-helm upgrade $HELM_RELEASE_NAME camunda/camunda-platform \
- --version $HELM_CHART_VERSION \
- --kube-context $CLUSTER_SURVIVING \
- --namespace $CAMUNDA_NAMESPACE_SURVIVING \
- -f camunda-values.yml \
- -f $REGION_SURVIVING/camunda-values.yml
+curl -XGET 'http://localhost:9600/actuator/exporters'
```
-#### Verification
-
-The following command will show the deployed pods of the surviving namespace. You should see that the Zeebe brokers have just restarted or are still restarting due to the configuration upgrade.
+
+ Example output
+
```bash
-kubectl --context $CLUSTER_SURVIVING get pods -n $CAMUNDA_NAMESPACE_SURVIVING
+[{"exporterId":"elasticsearchregion0","status":"ENABLED"},{"exporterId":"elasticsearchregion1","status":"ENABLED"}]
```
-Furthermore, the following command will watch the [StatefulSets](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/) update of the Zeebe brokers and wait until it's done.
+
+
+
+2. Based on the Exporter API you will send a request to the Zeebe Gateway to disable the Elasticsearch exporter to the lost region.
```bash
-kubectl --context $CLUSTER_SURVIVING rollout status --watch statefulset/$HELM_RELEASE_NAME-zeebe -n $CAMUNDA_NAMESPACE_SURVIVING
+curl -XPOST 'http://localhost:9600/actuator/exporters/elasticsearchregion1/disable'
```
-Alternatively, you can check that the Elasticsearch value was updated in the [StatefulSets](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/) configuration of the Zeebe brokers and are reflecting the previous output of the script `generate_zeebe_helm_values.sh` in **Step 2**.
+#### Verification
+
+Port-forwarding the Zeebe Gateway via `kubectl` for the REST API and listing all exporters will reveal their current status.
```bash
-kubectl --context $CLUSTER_SURVIVING get statefulsets $HELM_RELEASE_NAME-zeebe -oyaml -n $CAMUNDA_NAMESPACE_SURVIVING | grep -A1 'ZEEBE_BROKER_EXPORTERS_ELASTICSEARCHREGION[0-1]_ARGS_URL'
+kubectl --context $CLUSTER_SURVIVING port-forward services/$HELM_RELEASE_NAME-zeebe-gateway 9600:9600 -n $CAMUNDA_NAMESPACE_SURVIVING
+curl -XGET 'http://localhost:9600/actuator/exporters'
```
@@ -386,21 +362,16 @@ kubectl --context $CLUSTER_SURVIVING get statefulsets $HELM_RELEASE_NAME-zeebe -
```bash
- - name: ZEEBE_BROKER_EXPORTERS_ELASTICSEARCHREGION0_ARGS_URL
- value: http://camunda-elasticsearch-master-hl.camunda-london.svc.cluster.local:9200
---
- - name: ZEEBE_BROKER_EXPORTERS_ELASTICSEARCHREGION1_ARGS_URL
- value: http://camunda-elasticsearch-master-hl.camunda-london-failover.svc.cluster.local:9200
+[{"exporterId":"elasticsearchregion0","status":"ENABLED"},{"exporterId":"elasticsearchregion1","status":"DISABLED"}]
```
-Lastly, port-forwarding the Zeebe Gateway via `kubectl` and printing the topology should reveal that all brokers have joined the Zeebe cluster again.
+Via the already port-forwarded Zeebe Gateway, you can also check the status of the change by using the Cluster API.
```bash
-kubectl --context $CLUSTER_SURVIVING port-forward services/$HELM_RELEASE_NAME-zeebe-gateway 26500:26500 -n $CAMUNDA_NAMESPACE_SURVIVING
-zbctl status --insecure --address localhost:26500
+curl -XGET 'http://localhost:9600/actuator/cluster' | jq .lastChange
```
@@ -408,47 +379,12 @@ zbctl status --insecure --address localhost:26500
```bash
-Cluster size: 8
-Partitions count: 8
-Replication factor: 4
-Gateway version: 8.5.0
-Brokers:
- Broker 0 - camunda-zeebe-0.camunda-zeebe.camunda-london.svc:26501
- Version: 8.5.0
- Partition 1 : Leader, Healthy
- Partition 6 : Follower, Healthy
- Partition 7 : Follower, Healthy
- Partition 8 : Follower, Healthy
- Broker 1 - camunda-zeebe-0.camunda-zeebe.camunda-london-failover.svc:26501
- Version: 8.5.0
- Partition 1 : Follower, Healthy
- Partition 2 : Leader, Healthy
- Partition 7 : Follower, Healthy
- Partition 8 : Follower, Healthy
- Broker 2 - camunda-zeebe-1.camunda-zeebe.camunda-london.svc:26501
- Version: 8.5.0
- Partition 1 : Follower, Healthy
- Partition 2 : Follower, Healthy
- Partition 3 : Follower, Healthy
- Partition 8 : Leader, Healthy
- Broker 4 - camunda-zeebe-2.camunda-zeebe.camunda-london.svc:26501
- Version: 8.5.0
- Partition 2 : Follower, Healthy
- Partition 3 : Follower, Healthy
- Partition 4 : Follower, Healthy
- Partition 5 : Follower, Healthy
- Broker 5 - camunda-zeebe-1.camunda-zeebe.camunda-london-failover.svc:26501
- Version: 8.5.0
- Partition 3 : Leader, Healthy
- Partition 4 : Follower, Healthy
- Partition 5 : Follower, Healthy
- Partition 6 : Leader, Healthy
- Broker 6 - camunda-zeebe-3.camunda-zeebe.camunda-london.svc:26501
- Version: 8.5.0
- Partition 4 : Leader, Healthy
- Partition 5 : Leader, Healthy
- Partition 6 : Follower, Healthy
- Partition 7 : Leader, Healthy
+{
+ "id": 4,
+ "status": "COMPLETED",
+ "startedAt": "2024-08-23T11:36:14.127510679Z",
+ "completedAt": "2024-08-23T11:36:14.379980715Z"
+}
```
@@ -463,40 +399,31 @@ Brokers:
-#### Deploy Camunda 8 in the failback mode in the newly created region
+#### Deploy Camunda 8 in the newly created region
}
-desired={}
+current={}
+desired={}
/>
#### Current state
-You have temporary Zeebe brokers deployed in failover mode together with a temporary Elasticsearch within the same surviving region.
+You have a standalone region with a working Camunda 8 setup, including Zeebe, Operate, Tasklist, and Elasticsearch.
#### Desired state
-You want to restore the dual-region functionality again and deploy Zeebe in failback mode to the newly restored region.
-
-Failback mode means new `clusterSize/2` brokers will be installed in the restored region:
-
-- `clusterSize/4` brokers are running in the normal mode, participating processing and restoring the data.
-- `clusterSize/4` brokers are temporarily running in the sleeping mode. They will run in the normal mode later once the failover setup is removed.
-
-An Elasticsearch will also be deployed in the restored region, but not used yet, before the data is restored into it from the backup from the surviving Elasticsearch cluster.
+You want to restore the dual-region functionality and deploy Camunda 8, consisting of Zeebe and Elasticsearch, to the newly restored region. Operate and Tasklist need to stay disabled to prevent interference with the database backup and restore.
#### How to get there
-The changes previously done in the base Helm values file `camunda-values.yml` in `aws/dual-region/kubernetes` should still be present from **Failover - Step 2**.
+From your initial dual-region deployment, your base Helm values file `camunda-values.yml` in `aws/dual-region/kubernetes` should still be present.
-In particular, the values `ZEEBE_BROKER_EXPORTERS_ELASTICSEARCHREGION0_ARGS_URL` and `ZEEBE_BROKER_EXPORTERS_ELASTICSEARCHREGION1_ARGS_URL` should solely point at the surviving region.
+In particular, the values `ZEEBE_BROKER_EXPORTERS_ELASTICSEARCHREGION0_ARGS_URL` and `ZEEBE_BROKER_EXPORTERS_ELASTICSEARCHREGION1_ARGS_URL` should point to their respective regions. The placeholder in `ZEEBE_BROKER_CLUSTER_INITIALCONTACTPOINTS` should contain the Zeebe endpoints of both regions, the result of the `aws/dual-region/scripts/generate_zeebe_helm_values.sh`.
In addition, the following Helm command will disable Operate and Tasklist since those will only be enabled at the end of the full region restore. It's required to keep them disabled in the newly created region due to their Elasticsearch importers.
-Lastly, the `installationType` is set to `failBack` to switch the behavior of Zeebe and prepare for this procedure.
-
1. From the terminal context of `aws/dual-region/kubernetes` execute:
```bash
@@ -506,7 +433,6 @@ helm install $HELM_RELEASE_NAME camunda/camunda-platform \
--namespace $CAMUNDA_NAMESPACE_RECREATED \
-f camunda-values.yml \
-f $REGION_RECREATED/camunda-values.yml \
- --set global.multiregion.installationType=failBack \
--set operate.enabled=false \
--set tasklist.enabled=false
```
@@ -515,22 +441,72 @@ helm install $HELM_RELEASE_NAME camunda/camunda-platform \
The following command will show the deployed pods of the newly created region.
-Depending on your chosen `clusterSize`, you should see that the **failback** deployment contains some Zeebe instances being ready and others unready. Those unready instances are sleeping indefinitely and is the expected behavior.
-This behavior stems from the **failback** mode since we still have the temporary **failover**, which acts as a replacement for the lost region.
+Depending on your chosen `clusterSize`, you should see that half of the amount are spawned in Zeebe brokers.
-For example, in the case of `clusterSize: 8`, you find two active Zeebe brokers and two unready brokers in the newly created region.
+For example, in the case of `clusterSize: 8`, you find four Zeebe brokers in the newly created region.
+
+:::warning
+It is expected that the Zeebe broker pods don't become ready as they're not yet part of a Zeebe cluster, therefore not considered healthy by the Kubernetes readiness probe.
+:::
```bash
kubectl --context $CLUSTER_RECREATED get pods -n $CAMUNDA_NAMESPACE_RECREATED
```
-Port-forwarding the Zeebe Gateway via `kubectl` and printing the topology should reveal that the **failback** brokers have joined the cluster.
+Port-forwarding the Zeebe Gateway via `kubectl` and printing the topology should reveal that the new Zeebe brokers are recognized but yet a full member of the Zeebe cluster.
```bash
kubectl --context $CLUSTER_SURVIVING port-forward services/$HELM_RELEASE_NAME-zeebe-gateway 26500:26500 -n $CAMUNDA_NAMESPACE_SURVIVING
zbctl status --insecure --address localhost:26500
```
+
+ Example Output
+
+
+```bash
+Cluster size: 4
+Partitions count: 8
+Replication factor: 2
+Gateway version: 8.6.0
+Brokers:
+ Broker 0 - camunda-zeebe-0.camunda-zeebe.camunda-london.svc:26501
+ Version: 8.6.0
+ Partition 1 : Leader, Healthy
+ Partition 6 : Follower, Healthy
+ Partition 7 : Follower, Healthy
+ Partition 8 : Leader, Healthy
+ Broker 1 - camunda-zeebe-0.camunda-zeebe.camunda-paris.svc:26501
+ Version: 8.6.0
+ Broker 2 - camunda-zeebe-1.camunda-zeebe.camunda-london.svc:26501
+ Version: 8.6.0
+ Partition 1 : Follower, Healthy
+ Partition 2 : Leader, Healthy
+ Partition 3 : Leader, Healthy
+ Partition 8 : Follower, Healthy
+ Broker 3 - camunda-zeebe-1.camunda-zeebe.camunda-paris.svc:26501
+ Version: 8.6.0
+ Broker 4 - camunda-zeebe-2.camunda-zeebe.camunda-london.svc:26501
+ Version: 8.6.0
+ Partition 2 : Follower, Healthy
+ Partition 3 : Follower, Healthy
+ Partition 4 : Leader, Healthy
+ Partition 5 : Leader, Healthy
+ Broker 5 - camunda-zeebe-2.camunda-zeebe.camunda-paris.svc:26501
+ Version: 8.6.0
+ Broker 6 - camunda-zeebe-3.camunda-zeebe.camunda-london.svc:26501
+ Version: 8.6.0
+ Partition 4 : Follower, Healthy
+ Partition 5 : Follower, Healthy
+ Partition 6 : Leader, Healthy
+ Partition 7 : Leader, Healthy
+ Broker 7 - camunda-zeebe-3.camunda-zeebe.camunda-london.svc:26501
+ Version: 8.6.0
+```
+
+
+
+
@@ -538,32 +514,22 @@ zbctl status --insecure --address localhost:26500
#### Pause Zeebe exporters to Elasticsearch, pause Operate and Tasklist
}
-desired={}
+current={}
+desired={}
/>
#### Current state
-You currently have the following setups:
+You currently have the following setup:
-- Functioning Zeebe cluster (in multi-region mode):
- - Camunda 8 installation in the failover mode in the surviving region
- - Camunda 8 installation in the failback mode in the recreated region
+- Functioning Zeebe cluster (within a single region):
+ - working Camunda 8 installation in the surviving region
+ - non-participating Camunda 8 installation in the recreated region
#### Desired state
-:::warning
-
-This step is very important to minimize the risk of losing any data when restoring the backup in the recreated region.
-
-There remains a small chance of losing some data in Elasticsearch (and in turn, in Operate and Tasklist too). This is because Zeebe might have exported some records to the failover Elasticsearch in the surviving region, but not to the main Elasticsearch in the surviving region, before the exporters have been paused.
-
-This means those records will not be included in the surviving region's Elasticsearch backup when the recreated region's Elasticsearch is restored from the backup, leading to the new region missing those records (as Zeebe does not re-export them).
-
-:::
-
You are preparing everything for the newly created region to take over again to restore the functioning dual-region setup.
For this, stop the Zeebe exporters from exporting any new data to Elasticsearch so you can create an Elasticsearch backup.
@@ -585,7 +551,7 @@ kubectl --context $CLUSTER_SURVIVING scale -n $CAMUNDA_NAMESPACE_SURVIVING deplo
kubectl --context $CLUSTER_SURVIVING scale -n $CAMUNDA_NAMESPACE_SURVIVING deployments/$HELM_RELEASE_NAME-tasklist --replicas 0
```
-2. Disable the Zeebe Elasticsearch exporters in Zeebe via kubectl:
+2. Disable the Zeebe Elasticsearch exporters in Zeebe via kubectl using the [exporting API](./../../zeebe-deployment/operations/management-api.md#exporting-api):
```bash
kubectl --context $CLUSTER_SURVIVING port-forward services/$HELM_RELEASE_NAME-zeebe-gateway 9600:9600 -n $CAMUNDA_NAMESPACE_SURVIVING
@@ -605,7 +571,7 @@ kubectl --context $CLUSTER_SURVIVING get deployments $HELM_RELEASE_NAME-operate
# camunda-tasklist 0/0 0 0 23m
```
-For the Zeebe Elasticsearch exporters, there's currently no API available to confirm this. Only the response code of `204` indicates a successful disabling.
+For the Zeebe Elasticsearch exporters, there's currently no API available to confirm this. Only the response code of `204` indicates a successful disabling. This is a synchronous operation.
@@ -614,8 +580,8 @@ For the Zeebe Elasticsearch exporters, there's currently no API available to con
#### Create and restore Elasticsearch backup
}
-desired={}
+current={}
+desired={}
/>
@@ -626,7 +592,7 @@ The Camunda components are currently not reachable by end-users and will not pro
#### Desired state
-You are creating a backup of the main Elasticsearch instance in the surviving region and restore it in the recreated region. This Elasticsearch backup contains all the data and may take some time to be finished. The failover Elasticsearch instance only contains a subset of the data from after the region loss and is not sufficient to restore this in the new region.
+You are creating a backup of the main Elasticsearch instance in the surviving region and restore it in the recreated region. This Elasticsearch backup contains all the data and may take some time to be finished.
#### How to get there
@@ -648,7 +614,7 @@ export S3_BUCKET_NAME=$(terraform output -raw s3_bucket_name)
```bash
ELASTIC_POD=$(kubectl --context $CLUSTER_SURVIVING get pod --selector=app\.kubernetes\.io/name=elasticsearch -o jsonpath='{.items[0].metadata.name}' -n $CAMUNDA_NAMESPACE_SURVIVING)
-kubectl --context $CLUSTER_SURVIVING exec -n $CAMUNDA_NAMESPACE_SURVIVING -it $ELASTIC_POD -c elasticsearch -- curl -XPUT "http://localhost:9200/_snapshot/camunda_backup" -H "Content-Type: application/json" -d'
+kubectl --context $CLUSTER_SURVIVING exec -n $CAMUNDA_NAMESPACE_SURVIVING -it $ELASTIC_POD -c elasticsearch -- curl -XPUT 'http://localhost:9200/_snapshot/camunda_backup' -H 'Content-Type: application/json' -d'
{
"type": "s3",
"settings": {
@@ -664,13 +630,13 @@ kubectl --context $CLUSTER_SURVIVING exec -n $CAMUNDA_NAMESPACE_SURVIVING -it $E
```bash
# The backup will be called failback
-kubectl --context $CLUSTER_SURVIVING exec -n $CAMUNDA_NAMESPACE_SURVIVING -it $ELASTIC_POD -c elasticsearch -- curl -XPUT "http://localhost:9200/_snapshot/camunda_backup/failback?wait_for_completion=true"
+kubectl --context $CLUSTER_SURVIVING exec -n $CAMUNDA_NAMESPACE_SURVIVING -it $ELASTIC_POD -c elasticsearch -- curl -XPUT 'http://localhost:9200/_snapshot/camunda_backup/failback?wait_for_completion=true'
```
4. Verify the backup has been completed successfully by checking all backups and ensuring the `state` is `SUCCESS`:
```bash
-kubectl --context $CLUSTER_SURVIVING exec -n $CAMUNDA_NAMESPACE_SURVIVING -it $ELASTIC_POD -c elasticsearch -- curl -XGET "http://localhost:9200/_snapshot/camunda_backup/_all"
+kubectl --context $CLUSTER_SURVIVING exec -n $CAMUNDA_NAMESPACE_SURVIVING -it $ELASTIC_POD -c elasticsearch -- curl -XGET 'http://localhost:9200/_snapshot/camunda_backup/_all'
```
@@ -759,7 +725,7 @@ kubectl --context $CLUSTER_SURVIVING exec -n $CAMUNDA_NAMESPACE_SURVIVING -it $E
```bash
ELASTIC_POD=$(kubectl --context $CLUSTER_RECREATED get pod --selector=app\.kubernetes\.io/name=elasticsearch -o jsonpath='{.items[0].metadata.name}' -n $CAMUNDA_NAMESPACE_RECREATED)
-kubectl --context $CLUSTER_RECREATED exec -n $CAMUNDA_NAMESPACE_RECREATED -it $ELASTIC_POD -c elasticsearch -- curl -XPUT "http://localhost:9200/_snapshot/camunda_backup" -H "Content-Type: application/json" -d'
+kubectl --context $CLUSTER_RECREATED exec -n $CAMUNDA_NAMESPACE_RECREATED -it $ELASTIC_POD -c elasticsearch -- curl -XPUT 'http://localhost:9200/_snapshot/camunda_backup' -H 'Content-Type: application/json' -d'
{
"type": "s3",
"settings": {
@@ -774,7 +740,7 @@ kubectl --context $CLUSTER_RECREATED exec -n $CAMUNDA_NAMESPACE_RECREATED -it $E
6. Verify that the backup can be found in the shared S3 bucket:
```bash
-kubectl --context $CLUSTER_RECREATED exec -n $CAMUNDA_NAMESPACE_RECREATED -it $ELASTIC_POD -c elasticsearch -- curl -XGET "http://localhost:9200/_snapshot/camunda_backup/_all"
+kubectl --context $CLUSTER_RECREATED exec -n $CAMUNDA_NAMESPACE_RECREATED -it $ELASTIC_POD -c elasticsearch -- curl -XGET 'http://localhost:9200/_snapshot/camunda_backup/_all'
```
The example output above should be the same since it's the same backup.
@@ -782,13 +748,13 @@ The example output above should be the same since it's the same backup.
7. Restore Elasticsearch backup in the new region namespace `CAMUNDA_NAMESPACE_RECREATED`. Depending on the amount of data, this operation will take a while to complete.
```bash
-kubectl --context $CLUSTER_RECREATED exec -n $CAMUNDA_NAMESPACE_RECREATED -it $ELASTIC_POD -c elasticsearch -- curl -XPOST "http://localhost:9200/_snapshot/camunda_backup/failback/_restore?wait_for_completion=true"
+kubectl --context $CLUSTER_RECREATED exec -n $CAMUNDA_NAMESPACE_RECREATED -it $ELASTIC_POD -c elasticsearch -- curl -XPOST 'http://localhost:9200/_snapshot/camunda_backup/failback/_restore?wait_for_completion=true'
```
8. Verify that the restore has been completed successfully in the new region:
```bash
-kubectl --context $CLUSTER_RECREATED exec -n $CAMUNDA_NAMESPACE_RECREATED -it $ELASTIC_POD -c elasticsearch -- curl -XGET "http://localhost:9200/_snapshot/camunda_backup/failback/_status"
+kubectl --context $CLUSTER_RECREATED exec -n $CAMUNDA_NAMESPACE_RECREATED -it $ELASTIC_POD -c elasticsearch -- curl -XGET 'http://localhost:9200/_snapshot/camunda_backup/failback/_status'
```
@@ -842,11 +808,11 @@ The important part being the `state: "SUCCESS"` and that `done` and `total` are
-#### Configure Zeebe exporters to use Elasticsearch in the recreated region
+#### Start Operate and Tasklist again
}
-desired={}
+current={}
+desired={}
/>
@@ -859,54 +825,13 @@ The Camunda components remain unreachable by end-users as you proceed to restore
#### Desired state
-You are repointing all Zeebe brokers from the temporary Elasticsearch instance to the Elasticsearch in the recreated region.
-
-The Elasticsearch exporters will remain paused during this step.
+You can enable Operate and Tasklist again both in the surviving and recreated region. This will allow users to interact with Camunda 8 again.
#### How to get there
-Your `camunda-values-failover.yml` and base `camunda-values.yml` require adjustments again to reconfigure all installations to the Elasticsearch instance in the new region:
-
-- `ZEEBE_BROKER_EXPORTERS_ELASTICSEARCHREGION0_ARGS_URL`
-- `ZEEBE_BROKER_EXPORTERS_ELASTICSEARCHREGION1_ARGS_URL`
+The base Helm values file `camunda-values.yml` in `aws/dual-region/kubernetes` contains the adjustments for Elasticsearch and the Zeebe initial brokers, meaning we just have to reapply / upgrade the release to enable and deploy Operate and Tasklist.
-1. The bash script [generate_zeebe_helm_values.sh](https://github.com/camunda/c8-multi-region/blob/main/aws/dual-region/scripts/generate_zeebe_helm_values.sh) in the repository folder `aws/dual-region/scripts/` helps generate those values again. You only have to copy and replace them within the previously mentioned Helm values files. It will use the exported environment variables of the environment prerequisites for namespaces and regions.
-
-```bash
-./generate_zeebe_helm_values.sh failback
-
-# It will ask you to provide the following values
-# Enter Zeebe cluster size (total number of Zeebe brokers in both Kubernetes clusters):
-## for a dual-region setup we recommend eight, resulting in four brokers per region.
-```
-
-
- Example output
-
-
-```bash
-Please use the following to change the existing environment variable ZEEBE_BROKER_CLUSTER_INITIALCONTACTPOINTS in the failover Camunda Helm chart values file 'region0/camunda-values-failover.yml' and in the base Camunda Helm chart values file 'camunda-values.yml'. It's part of the 'zeebe.env' path.
-
-- name: ZEEBE_BROKER_CLUSTER_INITIALCONTACTPOINTS
- value: camunda-zeebe-0.camunda-zeebe.camunda-london.svc.cluster.local:26502,camunda-zeebe-0.camunda-zeebe.camunda-paris.svc.cluster.local:26502,camunda-zeebe-1.camunda-zeebe.camunda-london.svc.cluster.local:26502,camunda-zeebe-1.camunda-zeebe.camunda-paris.svc.cluster.local:26502,camunda-zeebe-2.camunda-zeebe.camunda-london.svc.cluster.local:26502,camunda-zeebe-2.camunda-zeebe.camunda-paris.svc.cluster.local:26502,camunda-zeebe-3.camunda-zeebe.camunda-london.svc.cluster.local:26502,camunda-zeebe-3.camunda-zeebe.camunda-paris.svc.cluster.local:26502
-
-Please use the following to change the existing environment variable ZEEBE_BROKER_EXPORTERS_ELASTICSEARCHREGION0_ARGS_URL in the failover Camunda Helm chart values file 'region0/camunda-values-failover.yml' and in the base Camunda Helm chart values file 'camunda-values.yml'. It's part of the 'zeebe.env' path.
-
-- name: ZEEBE_BROKER_EXPORTERS_ELASTICSEARCHREGION0_ARGS_URL
- value: http://camunda-elasticsearch-master-hl.camunda-london.svc.cluster.local:9200
-
-Please use the following to change the existing environment variable ZEEBE_BROKER_EXPORTERS_ELASTICSEARCHREGION1_ARGS_URL in the failover Camunda Helm chart values file 'region0/camunda-values-failover.yml' and in the base Camunda Helm chart values file 'camunda-values.yml'. It's part of the 'zeebe.env' path.
-
-- name: ZEEBE_BROKER_EXPORTERS_ELASTICSEARCHREGION1_ARGS_URL
- value: http://camunda-elasticsearch-master-hl.camunda-paris.svc.cluster.local:9200
-```
-
-
-
-
-2. As the script suggests, replace the environment variables within `camunda-values-failover.yml`.
-3. Repeat the adjustments for the base Helm values file `camunda-values.yml` in `aws/dual-region/kubernetes` with the same output for the mentioned environment variables.
-4. Upgrade the normal Camunda environment in `CAMUNDA_NAMESPACE_SURVIVING` and `REGION_SURVIVING` to point to the new Elasticsearch:
+1. Upgrade the normal Camunda environment in `CAMUNDA_NAMESPACE_SURVIVING` and `REGION_SURVIVING` to deploy Operate and Tasklist:
```bash
helm upgrade $HELM_RELEASE_NAME camunda/camunda-platform \
@@ -914,60 +839,90 @@ helm upgrade $HELM_RELEASE_NAME camunda/camunda-platform \
--kube-context $CLUSTER_SURVIVING \
--namespace $CAMUNDA_NAMESPACE_SURVIVING \
-f camunda-values.yml \
- -f $REGION_SURVIVING/camunda-values.yml \
- --set operate.enabled=false \
- --set tasklist.enabled=false
+ -f $REGION_SURVIVING/camunda-values.yml
```
-5. Upgrade the failover Camunda environment in `CAMUNDA_NAMESPACE_FAILOVER` and `REGION_SURVIVING` to point to the new Elasticsearch:
+2. Upgrade the new region environment in `CAMUNDA_NAMESPACE_RECREATED` and `REGION_RECREATED` to deploy Operate and Tasklist:
```bash
helm upgrade $HELM_RELEASE_NAME camunda/camunda-platform \
--version $HELM_CHART_VERSION \
- --kube-context $CLUSTER_SURVIVING \
- --namespace $CAMUNDA_NAMESPACE_FAILOVER \
+ --kube-context $CLUSTER_RECREATED \
+ --namespace $CAMUNDA_NAMESPACE_RECREATED \
-f camunda-values.yml \
- -f $REGION_SURVIVING/camunda-values-failover.yml
+ -f $REGION_RECREATED/camunda-values.yml
```
-6. Upgrade the new region environment in `CAMUNDA_NAMESPACE_RECREATED` and `REGION_RECREATED` to point to the new Elasticsearch:
+#### Verification
+
+For Operate and Tasklist, you can confirm that the deployments have successfully deployed by listing those and indicating `1/1` ready. The same command can be applied for the `CLUSTER_RECREATED` and `CAMUNDA_NAMESPACE_RECREATED`:
```bash
-helm upgrade $HELM_RELEASE_NAME camunda/camunda-platform \
- --version $HELM_CHART_VERSION \
- --kube-context $CLUSTER_RECREATED \
- --namespace $CAMUNDA_NAMESPACE_RECREATED \
- -f camunda-values.yml \
- -f $REGION_RECREATED/camunda-values.yml \
- --set global.multiregion.installationType=failBack \
- --set operate.enabled=false \
- --set tasklist.enabled=false
+kubectl --context $CLUSTER_SURVIVING get deployments -n $CAMUNDA_NAMESPACE_SURVIVING
+# NAME READY UP-TO-DATE AVAILABLE AGE
+# camunda-operate 1/1 1 1 3h24m
+# camunda-tasklist 1/1 1 1 3h24m
+# camunda-zeebe-gateway 1/1 1 1 3h24m
```
-7. Delete all the Zeebe broker pods in the recreated region, as those are blocking a successful rollout of the config change due to the failback mode. The resulting recreated Zeebe brokers pods are expected to be again half of them being functional and half of them running in the sleeping mode due to the failback mode.
+
+
+
+
+#### Initialize new Zeebe exporter to recreated region
+
+}
+desired={}
+/>
+
+
+
+#### Current state
+
+Camunda 8 is reachable to the end-user but not yet exporting any data.
+
+#### Desired state
+
+You are initializing a new exporter to the recreated region. This will ensure that both Elasticsearch instances are populated, resulting in data redundancy.
+
+Separating this step from resuming the exporters is essential as the initialization is an asynchronous procedure, and you must ensure it's finished before resuming the exporters.
+
+#### How to get there
+
+1. Initialize the new exporter for the recreated region by sending an API request via the Zeebe Gateway:
```bash
-kubectl --context $CLUSTER_RECREATED --namespace $CAMUNDA_NAMESPACE_RECREATED delete pods --selector=app\.kubernetes\.io/component=zeebe-broker
+kubectl --context $CLUSTER_SURVIVING port-forward services/$HELM_RELEASE_NAME-zeebe-gateway 9600:9600 -n $CAMUNDA_NAMESPACE_SURVIVING
+curl -XPOST 'http://localhost:9600/actuator/exporters/elasticsearchregion1/enable' -H 'Content-Type: application/json' -d '{"initializeFrom" : "elasticsearchregion0"}'
```
#### Verification
-The following command will show the deployed pods of the namespaces. You should see that the Zeebe brokers are restarting. Adjusting the command for the other cluster and namespaces should reveal the same.
+Port-forwarding the Zeebe Gateway via `kubectl` for the REST API and listing all exporters will reveal their current status.
```bash
-kubectl --context $CLUSTER_SURVIVING get pods -n $CAMUNDA_NAMESPACE_SURVIVING
+kubectl --context $CLUSTER_SURVIVING port-forward services/$HELM_RELEASE_NAME-zeebe-gateway 9600:9600 -n $CAMUNDA_NAMESPACE_SURVIVING
+curl -XGET 'http://localhost:9600/actuator/exporters'
```
-Furthermore, the following command will watch the [StatefulSets](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/) update of the Zeebe brokers and wait until it's done. Adjusting the command for the other cluster and namespaces should have the same effect.
+
+ Example output
+
```bash
-kubectl --context $CLUSTER_SURVIVING rollout status --watch statefulset/$HELM_RELEASE_NAME-zeebe -n $CAMUNDA_NAMESPACE_SURVIVING
+[{"exporterId":"elasticsearchregion0","status":"ENABLED"},{"exporterId":"elasticsearchregion1","status":"ENABLED"}]
```
-Alternatively, you can check that the Elasticsearch value was updated in the [StatefulSets](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/) configuration of the Zeebe brokers and are reflecting the previous output of the script `generate_zeebe_helm_values.sh` in **Step 1**.
+
+
+
+Via the already port-forwarded Zeebe Gateway, you can also check the status of the change by using the Cluster API.
+
+**Ensure it says "COMPLETED" before proceeding with the next step.**
```bash
-kubectl --context $CLUSTER_SURVIVING get statefulsets $HELM_RELEASE_NAME-zeebe -oyaml -n $CAMUNDA_NAMESPACE_SURVIVING | grep -A1 'ZEEBE_BROKER_EXPORTERS_ELASTICSEARCHREGION[0-1]_ARGS_URL'
+curl -XGET 'http://localhost:9600/actuator/cluster' | jq .lastChange
```
@@ -975,21 +930,22 @@ kubectl --context $CLUSTER_SURVIVING get statefulsets $HELM_RELEASE_NAME-zeebe -
```bash
- - name: ZEEBE_BROKER_EXPORTERS_ELASTICSEARCHREGION0_ARGS_URL
- value: http://camunda-elasticsearch-master-hl.camunda-london.svc.cluster.local:9200
---
- - name: ZEEBE_BROKER_EXPORTERS_ELASTICSEARCHREGION1_ARGS_URL
- value: http://camunda-elasticsearch-master-hl.camunda-london-failover.svc.cluster.local:9200
+{
+ "id": 6,
+ "status": "COMPLETED",
+ "startedAt": "2024-08-23T12:54:07.968549269Z",
+ "completedAt": "2024-08-23T12:54:09.282558853Z"
+}
```
-
-
+
+
-#### Reactivate Zeebe exporters, Operate, and Tasklist
+#### Reactivate Zeebe exporter
}
@@ -1000,38 +956,17 @@ desired={}
#### Current state
-Camunda 8 is pointing at the Elasticsearch instances in both regions again and not the temporary instance. It still remains unreachable to the end-users and no processes are advanced.
+Camunda 8 is reachable to the end-user but not yet exporting any data.
+
+Elasticsearch exporters are enabled for both regions, and it's ensured that the operation has finished.
#### Desired state
-You are reactivating the exporters and enabling Operate and Tasklist again within the two regions. This will allow users to interact with Camunda 8 again.
+You are reactivating the existing exporters. This will allow Zeebe to export data to Elasticsearch again.
#### How to get there
-1. Upgrade the normal Camunda environment in `CAMUNDA_NAMESPACE_SURVIVING` and `REGION_SURVIVING` to deploy Operate and Tasklist:
-
-```bash
-helm upgrade $HELM_RELEASE_NAME camunda/camunda-platform \
- --version $HELM_CHART_VERSION \
- --kube-context $CLUSTER_SURVIVING \
- --namespace $CAMUNDA_NAMESPACE_SURVIVING \
- -f camunda-values.yml \
- -f $REGION_SURVIVING/camunda-values.yml
-```
-
-2. Upgrade the new region environment in `CAMUNDA_NAMESPACE_RECREATED` and `REGION_RECREATED` to deploy Operate and Tasklist:
-
-```bash
-helm upgrade $HELM_RELEASE_NAME camunda/camunda-platform \
- --version $HELM_CHART_VERSION \
- --kube-context $CLUSTER_RECREATED \
- --namespace $CAMUNDA_NAMESPACE_RECREATED \
- -f camunda-values.yml \
- -f $REGION_RECREATED/camunda-values.yml \
- --set global.multiregion.installationType=failBack
-```
-
-3. Reactivate the exporters by sending the API activation request via the Zeebe Gateway:
+1. Reactivate the exporters by sending the [exporting API](./../../zeebe-deployment/operations/management-api.md#exporting-api) activation request via the Zeebe Gateway:
```bash
kubectl --context $CLUSTER_SURVIVING port-forward services/$HELM_RELEASE_NAME-zeebe-gateway 9600:9600 -n $CAMUNDA_NAMESPACE_SURVIVING
@@ -1042,23 +977,13 @@ curl -i localhost:9600/actuator/exporting/resume -XPOST
#### Verification
-For Operate and Tasklist, you can confirm that the deployments have successfully deployed by listing those and indicating `1/1` ready. The same command can be applied for the `CLUSTER_RECREATED` and `CAMUNDA_NAMESPACE_RECREATED`:
-
-```bash
-kubectl --context $CLUSTER_SURVIVING get deployments -n $CAMUNDA_NAMESPACE_SURVIVING
-# NAME READY UP-TO-DATE AVAILABLE AGE
-# camunda-operate 1/1 1 1 3h24m
-# camunda-tasklist 1/1 1 1 3h24m
-# camunda-zeebe-gateway 1/1 1 1 3h24m
-```
-
-For the Zeebe Elasticsearch exporters, there's currently no API available to confirm this. Only the response code of `204` indicates a successful resumption.
+For the reactivating the exporters, there's currently no API available to confirm this. Only the response code of `204` indicates a successful resumption. This is a synchronous operation.
-
+
-#### Remove temporary failover installation
+#### Add new brokers to the Zeebe cluster
}
@@ -1069,94 +994,122 @@ desired={}
#### Current state
-Camunda 8 is healthy and running in two regions again. You have redeployed Operate and Tasklist and enabled the Elasticsearch exporters again. This will allow users to interact with Camunda 8 again.
+Camunda 8 is running in two regions but not yet utilizing all Zeebe brokers. You have redeployed Operate and Tasklist and enabled the Elasticsearch exporters again. This will allow users to interact with Camunda 8 again.
#### Desired state
-You can remove the temporary failover solution since it is not required anymore and would hinder disablement of the failback mode within the new region.
+You have a functioning Camunda 8 setup in two regions and utilizing both regions. This will fully recover the dual-region benefits.
#### How to get there
-1. Uninstall the failover installation via Helm:
+1. Based on the base Helm values file `camunda-values.yml` in `aws/dual-region/kubernetes`, you have to extract the `clusterSize` and `replicationFactor` as you have to re-add the brokers to the Zeebe cluster.
+2. Port-forwarding the Zeebe Gateway via `kubectl` for the REST API allows you to send a Cluster API call to add the new brokers to the Zeebe cluster with the previous information on size and replication.
+ E.g. in our case the `clusterSize` is 8 and `replicationFactor` is 4 meaning we have to list all broker IDs starting from 0 to 7 and set the correct `replicationFactor` in the query.
```bash
-helm uninstall $HELM_RELEASE_NAME --kube-context $CLUSTER_SURVIVING --namespace $CAMUNDA_NAMESPACE_FAILOVER
+kubectl --context $CLUSTER_SURVIVING port-forward services/$HELM_RELEASE_NAME-zeebe-gateway 9600:9600 -n $CAMUNDA_NAMESPACE_SURVIVING
+curl -XPOST 'http://localhost:9600/actuator/cluster/brokers?replicationFactor=4' -H 'Content-Type: application/json' -d '["0", "1", "2", "3", "4", "5", "6", "7"]'
```
-2. Delete the leftover persistent volume claims of the Camunda 8 components:
-
-```bash
-kubectl --context $CLUSTER_SURVIVING delete pvc --all -n $CAMUNDA_NAMESPACE_FAILOVER
-```
+:::note
+This step can take longer depending on the size of the cluster, size of the data and the current load.
+:::
#### Verification
-The following will show the pods within the namespace. You deleted the Helm installation in the failover namespace, which should result in no pods or in deletion state.
+Port-forwarding the Zeebe Gateway via `kubectl` for the REST API and checking the Cluster API endpoint will show the status of the last change.
```bash
-kubectl --context $CLUSTER_SURVIVING get pods -n $CAMUNDA_NAMESPACE_FAILOVER
+kubectl --context $CLUSTER_SURVIVING port-forward services/$HELM_RELEASE_NAME-zeebe-gateway 9600:9600 -n $CAMUNDA_NAMESPACE_SURVIVING
+curl -XGET 'http://localhost:9600/actuator/cluster' | jq .lastChange
```
-Port-forwarding the Zeebe Gateway via `kubectl` and printing the topology should reveal that the failover brokers are missing:
+
+ Example output
+
```bash
-kubectl --context $CLUSTER_SURVIVING port-forward services/$HELM_RELEASE_NAME-zeebe-gateway 26500:26500 -n $CAMUNDA_NAMESPACE_SURVIVING
-zbctl status --insecure --address localhost:26500
+{
+ "id": 6,
+ "status": "COMPLETED",
+ "startedAt": "2024-08-23T12:54:07.968549269Z",
+ "completedAt": "2024-08-23T12:54:09.282558853Z"
+}
```
-
-
-
-
-#### Switch Zeebe brokers in the recreated region to normal mode
-
-}
-desired={}
-/>
-
-
-
-#### Current state
-
-You have almost fully restored the dual-region setup. Two Camunda deployments exist in two different regions.
-
-The failback mode is still enabled in the restored region.
-
-#### Desired state
-
-You restore the new region to its normal functionality by removing the failback mode and forcefully removing the sleeping Zeebe pods. They would otherwise hinder the rollout since they will never be ready.
-
-With this done, Zeebe is fully functional again and you are prepared in case of another region loss.
-
-#### How to get there
+
+
-1. Upgrade the new region environment in `CAMUNDA_NAMESPACE_RECREATED` and `REGION_RECREATED` by removing the failback mode:
+Port-forwarding the Zeebe Gateway via kubectl and printing the topology should reveal that all brokers have joined the Zeebe cluster again.
-```bash
-helm upgrade $HELM_RELEASE_NAME camunda/camunda-platform \
- --version $HELM_CHART_VERSION \
- --kube-context $CLUSTER_RECREATED \
- --namespace $CAMUNDA_NAMESPACE_RECREATED \
- -f camunda-values.yml \
- -f $REGION_RECREATED/camunda-values.yml
```
-
-2. Delete the sleeping pods in the new region, as those are blocking a successful rollout due to the failback mode:
-
-```bash
-kubectl --context $CLUSTER_RECREATED --namespace $CAMUNDA_NAMESPACE_RECREATED delete pods --selector=app\.kubernetes\.io/component=zeebe-broker
+kubectl --context $CLUSTER_SURVIVING port-forward services/$HELM_RELEASE_NAME-zeebe-gateway 26500:26500 -n $CAMUNDA_NAMESPACE_SURVIVING
+zbctl status --insecure --address localhost:26500
```
-#### Verification
-
-Port-forwarding the Zeebe Gateway via `kubectl` and printing the topology should reveal that all brokers have joined the Zeebe cluster again.
+
+ Example Output
+
```bash
-kubectl --context $CLUSTER_SURVIVING port-forward services/$HELM_RELEASE_NAME-zeebe-gateway 26500:26500 -n $CAMUNDA_NAMESPACE_SURVIVING
-zbctl status --insecure --address localhost:26500
+Cluster size: 8
+Partitions count: 8
+Replication factor: 4
+Gateway version: 8.6.0
+Brokers:
+Broker 0 - camunda-zeebe-0.camunda-zeebe.camunda-london.svc:26501
+ Version: 8.6.0
+ Partition 1 : Leader, Healthy
+ Partition 6 : Follower, Healthy
+ Partition 7 : Follower, Healthy
+ Partition 8 : Leader, Healthy
+Broker 1 - camunda-zeebe-0.camunda-zeebe.camunda-paris.svc:26501
+ Version: 8.6.0
+ Partition 1 : Follower, Healthy
+ Partition 2 : Follower, Healthy
+ Partition 7 : Follower, Healthy
+ Partition 8 : Follower, Healthy
+Broker 2 - camunda-zeebe-1.camunda-zeebe.camunda-london.svc:26501
+ Version: 8.6.0
+ Partition 1 : Follower, Healthy
+ Partition 2 : Follower, Healthy
+ Partition 3 : Follower, Healthy
+ Partition 8 : Follower, Healthy
+Broker 3 - camunda-zeebe-1.camunda-zeebe.camunda-paris.svc:26501
+ Version: 8.6.0
+ Partition 1 : Follower, Healthy
+ Partition 2 : Follower, Healthy
+ Partition 3 : Follower, Healthy
+ Partition 4 : Follower, Healthy
+Broker 4 - camunda-zeebe-2.camunda-zeebe.camunda-london.svc:26501
+ Version: 8.6.0
+ Partition 2 : Leader, Healthy
+ Partition 3 : Leader, Healthy
+ Partition 4 : Leader, Healthy
+ Partition 5 : Follower, Healthy
+Broker 5 - camunda-zeebe-2.camunda-zeebe.camunda-paris.svc:26501
+ Version: 8.6.0
+ Partition 3 : Follower, Healthy
+ Partition 4 : Follower, Healthy
+ Partition 5 : Follower, Healthy
+ Partition 6 : Follower, Healthy
+Broker 6 - camunda-zeebe-3.camunda-zeebe.camunda-london.svc:26501
+ Version: 8.6.0
+ Partition 4 : Follower, Healthy
+ Partition 5 : Leader, Healthy
+ Partition 6 : Leader, Healthy
+ Partition 7 : Leader, Healthy
+Broker 7 - camunda-zeebe-3.camunda-zeebe.camunda-paris.svc:26501
+ Version: 8.6.0
+ Partition 5 : Follower, Healthy
+ Partition 6 : Follower, Healthy
+ Partition 7 : Follower, Healthy
+ Partition 8 : Follower, Healthy
```
+
+
+