Skip to content

Commit

Permalink
add troubleshooting guide for synciq error
Browse files Browse the repository at this point in the history
  • Loading branch information
chimanjain committed Nov 26, 2024
1 parent cb5c799 commit 7120811
Showing 1 changed file with 7 additions and 6 deletions.
13 changes: 7 additions & 6 deletions content/docs/replication/troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,16 @@ description: >
---

| Symptoms | Prevention, Resolution or Workaround |
| --- | --- |
| --- | --- |
| Persistent volumes don't get created on the target cluster. | Run `kubectl describe` on one of the pods of replication controller and see if event says `Config update won't be applied because of invalid configmap/secrets. Please fix the invalid configuration`. If it does, then ensure you correctly populated replication ConfigMap. You can check the current status by running `kubectl describe cm -n dell-replication-controller dell-replication-controller-config`. If ConfigMap is empty, please edit it yourself or use `repctl cluster inject` command. |
| Persistent volumes don't get created on the target cluster. You don't see any events on the replication-controller pod. | Check logs of replication controller by running `kubectl logs -n dell-replication-controller dell-replication-controller-manager-<generated-symbols>`. If you see `clusterId - <clusterID> not found` errors then be sure to check if you specified the same clusterIDs in both your ConfigMap and replication enabled StorageClass. |
| You apply replication action by manually editing ReplicationGroup resource field `spec.action` and don't see any change of ReplicationGroup state after a while. | Check events of the replication-controller pod, if it says `Cannot proceed with action <your-action>. [unsupported action]` then check spelling of your action and consult the [Replication Actions](../replication-actions) page. Alternatively, you can use `repctl` instead of manually editing ReplicationGroup resources. |
| You execute failover action using `repctl failover` command and see `failover: error executing failover to source site`. | This means you tried to failover to a cluster that is already marked source. If you still want to execute failover for RG, just choose another cluster. |
| You've created PersistentVolumeClaim using replication enabled StorageClass but don't see any RGs created in the source cluster. | Check annotations of created PersistentVolumeClaim. If it doesn't have `annotations` that start with `replication.storage.dell.com` then please wait for a couple of minutes for them to be added and RG to be created. |
| Persistent volumes don't get created on the target cluster. You don't see any events on the replication-controller pod. | Check logs of replication controller by running `kubectl logs -n dell-replication-controller dell-replication-controller-manager-<generated-symbols>`. If you see `clusterId - <clusterID> not found` errors then be sure to check if you specified the same clusterIDs in both your ConfigMap and replication enabled StorageClass. |
| You apply replication action by manually editing ReplicationGroup resource field `spec.action` and don't see any change of ReplicationGroup state after a while. | Check events of the replication-controller pod, if it says `Cannot proceed with action <your-action>. [unsupported action]` then check spelling of your action and consult the [Replication Actions](../replication-actions) page. Alternatively, you can use `repctl` instead of manually editing ReplicationGroup resources. |
| You execute failover action using `repctl failover` command and see `failover: error executing failover to source site`. | This means you tried to failover to a cluster that is already marked source. If you still want to execute failover for RG, just choose another cluster. |
| You've created PersistentVolumeClaim using replication enabled StorageClass but don't see any RGs created in the source cluster. | Check annotations of created PersistentVolumeClaim. If it doesn't have `annotations` that start with `replication.storage.dell.com` then please wait for a couple of minutes for them to be added and RG to be created. |
| When installing common replication controller using helm you see an error that states `invalid ownership metadata` and `missing key "app.kubernetes.io/managed-by": must be set to "Helm"` | This means that you haven't fully deleted the previous release, you can fix it by either deleting entire manifest by using `kubectl delete -f deploy/controller.yaml` or manually deleting conflicting resources (ClusterRoles, ClusterRoleBinding, etc.) |
| PV and/or PVCs are not being created at the source/target cluster. If you check the controller's logs you can see `no such host` errors| Make sure cluster-1's API is pingable from cluster-2 and vice versa. If one of your clusters is OpenShift located in a private network and needs records in /etc/hosts, `exec` into controller pod and modify `/etc/hosts` manually. |
| After upgrading to Replication v1.4.0, if `kubectl get rg` returns an error `Unable to list "replication.storage.dell.com/v1alpha1, Resource=dellcsireplicationgroups"`| This means `kubectl` still doesn't recognize the new version of CRD `dellcsireplicationgroups.replication.storage.dell.com` after upgrade. Running the command `kubectl get DellCSIReplicationGroup.v1.replication.storage.dell.com/<rg-id> -o yaml` will resolve the issue. |
| To add or delete PV s in the existing SYNC Replication Group in PowerStore, you may encounter the error `The operation is restricted as sync replication session for resource <Replication Group Name> is not paused` | To resolve this, you need to pause the replication group, add the PV, and then resume the replication group (RG). The commands for the pause and resume operations are: `repctl --rg <rg-id> exec -a suspend` `repctl --rg <rg-id> exec -a resume` |
| To delete the last volume from the existing SYNC Replication Group in Powerstore, you may encounter the error 'failed to remove volume from volume group: The operation cannot be completed on metro or replicated volume group because volume group will become empty after last members are removed' | To resolve this, unassign the protection policy from the corresponding volume group on the PowerStore Manager UI. After that, you can successfully delete the last volume in that SYNC Replication Group.|
| To delete the last volume from the existing SYNC Replication Group in Powerstore, you may encounter the error 'failed to remove volume from volume group: The operation cannot be completed on metro or replicated volume group because volume group will become empty after last members are removed' | To resolve this, unassign the protection policy from the corresponding volume group on the PowerStore Manager UI. After that, you can successfully delete the last volume in that SYNC Replication Group.|
| When running CSI-PowerMax with Replication in a multi-cluster configuration, the driver on the target cluster fails and the following error is seen in logs: `error="CSI reverseproxy service host or port not found, CSI reverseproxy not installed properly"` | The reverseproxy service needs to be created manually on the target cluster. Follow [the instructions here](../../deployment/csmoperator/modules/replication#configuration-steps) to create it.|
| When getting the following error for CSi-Powerscale with Replication with encryption enabled: `SyncIQ policy failed to establish an encrypted connection`, the Replication groups and PVC's won't be created at target cluster. | The `encryption required` flag in the SyncIQ settings was set to "yes" by default in OneFS 9.0+. To rectify this error, please follow the following article: <https://www.dell.com/support/kbdoc/en-us/000215174/isilon-synciq-9-0-all-policies-fail-when-source-or-target-cluster-is-on-onefs-9-0-with-no-node-on-source-cluster-was-able-to-connect-to-target-cluster> |

0 comments on commit 7120811

Please sign in to comment.