From 7059dbafffddfd285f9b94e4e0a03ea5e89a94bb Mon Sep 17 00:00:00 2001 From: Chiman Jain Date: Tue, 26 Nov 2024 11:46:39 +0530 Subject: [PATCH 1/2] add troubleshooting guide for synciq error --- content/docs/replication/troubleshooting.md | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/content/docs/replication/troubleshooting.md b/content/docs/replication/troubleshooting.md index 325e9459c5..11d01ec249 100644 --- a/content/docs/replication/troubleshooting.md +++ b/content/docs/replication/troubleshooting.md @@ -7,15 +7,16 @@ description: > --- | Symptoms | Prevention, Resolution or Workaround | -| --- | --- | +| --- | --- | | Persistent volumes don't get created on the target cluster. | Run `kubectl describe` on one of the pods of replication controller and see if event says `Config update won't be applied because of invalid configmap/secrets. Please fix the invalid configuration`. If it does, then ensure you correctly populated replication ConfigMap. You can check the current status by running `kubectl describe cm -n dell-replication-controller dell-replication-controller-config`. If ConfigMap is empty, please edit it yourself or use `repctl cluster inject` command. | -| Persistent volumes don't get created on the target cluster. You don't see any events on the replication-controller pod. | Check logs of replication controller by running `kubectl logs -n dell-replication-controller dell-replication-controller-manager-`. If you see `clusterId - not found` errors then be sure to check if you specified the same clusterIDs in both your ConfigMap and replication enabled StorageClass. | -| You apply replication action by manually editing ReplicationGroup resource field `spec.action` and don't see any change of ReplicationGroup state after a while. | Check events of the replication-controller pod, if it says `Cannot proceed with action . [unsupported action]` then check spelling of your action and consult the [Replication Actions](../replication-actions) page. Alternatively, you can use `repctl` instead of manually editing ReplicationGroup resources. | -| You execute failover action using `repctl failover` command and see `failover: error executing failover to source site`. | This means you tried to failover to a cluster that is already marked source. If you still want to execute failover for RG, just choose another cluster. | -| You've created PersistentVolumeClaim using replication enabled StorageClass but don't see any RGs created in the source cluster. | Check annotations of created PersistentVolumeClaim. If it doesn't have `annotations` that start with `replication.storage.dell.com` then please wait for a couple of minutes for them to be added and RG to be created. | +| Persistent volumes don't get created on the target cluster. You don't see any events on the replication-controller pod. | Check logs of replication controller by running `kubectl logs -n dell-replication-controller dell-replication-controller-manager-`. If you see `clusterId - not found` errors then be sure to check if you specified the same clusterIDs in both your ConfigMap and replication enabled StorageClass. | +| You apply replication action by manually editing ReplicationGroup resource field `spec.action` and don't see any change of ReplicationGroup state after a while. | Check events of the replication-controller pod, if it says `Cannot proceed with action . [unsupported action]` then check spelling of your action and consult the [Replication Actions](../replication-actions) page. Alternatively, you can use `repctl` instead of manually editing ReplicationGroup resources. | +| You execute failover action using `repctl failover` command and see `failover: error executing failover to source site`. | This means you tried to failover to a cluster that is already marked source. If you still want to execute failover for RG, just choose another cluster. | +| You've created PersistentVolumeClaim using replication enabled StorageClass but don't see any RGs created in the source cluster. | Check annotations of created PersistentVolumeClaim. If it doesn't have `annotations` that start with `replication.storage.dell.com` then please wait for a couple of minutes for them to be added and RG to be created. | | When installing common replication controller using helm you see an error that states `invalid ownership metadata` and `missing key "app.kubernetes.io/managed-by": must be set to "Helm"` | This means that you haven't fully deleted the previous release, you can fix it by either deleting entire manifest by using `kubectl delete -f deploy/controller.yaml` or manually deleting conflicting resources (ClusterRoles, ClusterRoleBinding, etc.) | | PV and/or PVCs are not being created at the source/target cluster. If you check the controller's logs you can see `no such host` errors| Make sure cluster-1's API is pingable from cluster-2 and vice versa. If one of your clusters is OpenShift located in a private network and needs records in /etc/hosts, `exec` into controller pod and modify `/etc/hosts` manually. | | After upgrading to Replication v1.4.0, if `kubectl get rg` returns an error `Unable to list "replication.storage.dell.com/v1alpha1, Resource=dellcsireplicationgroups"`| This means `kubectl` still doesn't recognize the new version of CRD `dellcsireplicationgroups.replication.storage.dell.com` after upgrade. Running the command `kubectl get DellCSIReplicationGroup.v1.replication.storage.dell.com/ -o yaml` will resolve the issue. | | To add or delete PV s in the existing SYNC Replication Group in PowerStore, you may encounter the error `The operation is restricted as sync replication session for resource is not paused` | To resolve this, you need to pause the replication group, add the PV, and then resume the replication group (RG). The commands for the pause and resume operations are: `repctl --rg exec -a suspend` `repctl --rg exec -a resume` | -| To delete the last volume from the existing SYNC Replication Group in Powerstore, you may encounter the error 'failed to remove volume from volume group: The operation cannot be completed on metro or replicated volume group because volume group will become empty after last members are removed' | To resolve this, unassign the protection policy from the corresponding volume group on the PowerStore Manager UI. After that, you can successfully delete the last volume in that SYNC Replication Group.| -| When running CSI-PowerMax with Replication in a multi-cluster configuration, the driver on the target cluster fails and the following error is seen in logs: `error="CSI reverseproxy service host or port not found, CSI reverseproxy not installed properly"` | The reverseproxy service needs to be created manually on the target cluster. Follow [the instructions here](../../deployment/csmoperator/modules/replication#configuration-steps) to create it.| +| To delete the last volume from the existing SYNC Replication Group in Powerstore, you may encounter the error 'failed to remove volume from volume group: The operation cannot be completed on metro or replicated volume group because volume group will become empty after last members are removed' | To resolve this, unassign the protection policy from the corresponding volume group on the PowerStore Manager UI. After that, you can successfully delete the last volume in that SYNC Replication Group. | +| When running CSI-PowerMax with Replication in a multi-cluster configuration, the driver on the target cluster fails and the following error is seen in logs: `error="CSI reverseproxy service host or port not found, CSI reverseproxy not installed properly"` | The reverseproxy service needs to be created manually on the target cluster. Follow [the instructions here](../../deployment/csmoperator/modules/replication#configuration-steps) to create it. | +| When getting the following error for CSi-Powerscale with Replication with encryption enabled: `SyncIQ policy failed to establish an encrypted connection`, the Replication groups and PVC's won't be created at target cluster. | The `encryption required` flag in the SyncIQ settings was set to "yes" by default in OneFS 9.0+. To rectify this error, please follow the following article: | From 88b7a186aeb26c3512988f1e73bc5f0c95a25a26 Mon Sep 17 00:00:00 2001 From: shefali-malhotra Date: Tue, 26 Nov 2024 02:31:19 -0500 Subject: [PATCH 2/2] bug fix for KRV-30617 Signed-off-by: shefali-malhotra --- content/docs/deployment/csmoperator/_index.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/content/docs/deployment/csmoperator/_index.md b/content/docs/deployment/csmoperator/_index.md index b7c90a723a..c4542c84c6 100644 --- a/content/docs/deployment/csmoperator/_index.md +++ b/content/docs/deployment/csmoperator/_index.md @@ -350,6 +350,8 @@ The `Update approval` (**`InstallPlan`** in OLM terms) strategy plays a role whi >NOTE: The recommended version of OLM for Upstream Kubernetes is **`v0.25.0`**. +>NOTE: The recommended Update Approval is **`Manual`** to prevent the inatsllation of non-qualified versions of operator. + #### Using Installation Script 1. Clone and checkout the required csm-operator version using