-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes issue when multiple VRGs conflict with the PVCs being protected #1535
base: main
Are you sure you want to change the base?
Commits on Aug 30, 2024
-
Fixes https://issues.redhat.com/browse/OCSBZM-4691
Signed-off-by: Annaraya Narasagond <[email protected]> Signed-off-by: Annaraya Narasagond <[email protected]> Annaraya Narasagond <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for ed18630 - Browse repository at this point
Copy the full SHA ed18630View commit details
Commits on Sep 5, 2024
-
Moving locks to util and some refactoring
Signed-off-by: Annaraya Narasagond <[email protected]> Signed-off-by: Annaraya Narasagond <[email protected]> Annaraya Narasagond <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for e127357 - Browse repository at this point
Copy the full SHA e127357View commit details -
Signed-off-by: Annaraya Narasagond <[email protected]> Signed-off-by: Annaraya Narasagond <[email protected]> Annaraya Narasagond <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for ef385a5 - Browse repository at this point
Copy the full SHA ef385a5View commit details
Commits on Oct 8, 2024
-
Use PVC name and namespace for changing PVC conditions (RamenDR#1528)
Previously we used VolumeReplication name and namespace for updating PVC conditions due to the fact that we had one-to-one relationship between VolumeReplication and PVC. Now when we are going to use consistency group this handling will not be right anymore, because several PVCs can be related to one VolumeReplication. So, we need to use PVC name and namespace for updating PVC conditions. Signed-off-by: Elena Gershkovich <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 6e45f89 - Browse repository at this point
Copy the full SHA 6e45f89View commit details -
Signed-off-by: rakeshgm <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 025adc3 - Browse repository at this point
Copy the full SHA 025adc3View commit details -
Using one item per line to make future changes easier. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 37aae20 - Browse repository at this point
Copy the full SHA 37aae20View commit details -
Current python on Fedora 39 and macOS is 3.12, and 3.13 development version is available on github actions for a while. Add the versions so we can ensure compatibility with current and next python version. We keep python3.9 for compatibility with system using old python like RHEL 9. At some point we will want to drop old version and consume newer features in current python, but we don't have anything requiring this yet, so let try to be compatible. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for b015270 - Browse repository at this point
Copy the full SHA b015270View commit details -
Upgrading capabilities in the CSVs
For ramen, the capabilities for hub and cluster bundles currently states "Basic Install", Updating it to "Seamless Upgrades". Fixes: [Bug-2303823](https://bugzilla.redhat.com/show_bug.cgi?id=2303823) Signed-off-by: Abhijeet Shakya <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 48b3643 - Browse repository at this point
Copy the full SHA 48b3643View commit details -
Update open-cluster-management to use v0.13.0
Signed-off-by: Abhijeet Shakya <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for f8ed0e5 - Browse repository at this point
Copy the full SHA f8ed0e5View commit details -
Delete namespace manifestwork for applications
The changeset includes a new DeleteNamespaceManifestWork() func, which first checks if the mw.Spec has delete option or if it already has a DeletionTimestamp. Accordingly, it proceeds to delete the namespace manifestwork. It also updates the namespace manifestwork with the deleteOption and propogationPolicy of type orphan, whenever the createOrUpdateNamespaceManifest() func is called. Fixes: [Bug 2059669](https://bugzilla.redhat.com/show_bug.cgi?id=2059669) Signed-off-by: Abhijeet Shakya <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 93481df - Browse repository at this point
Copy the full SHA 93481dfView commit details -
Update unit-tests to verify deletion of namespace manifestwork
Signed-off-by: Abhijeet Shakya <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 02ba28b - Browse repository at this point
Copy the full SHA 02ba28bView commit details -
tests: keep all suite cleanup functions in suite_test.go
Signed-off-by: Raghavendra Talur <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for b511e77 - Browse repository at this point
Copy the full SHA b511e77View commit details -
Signed-off-by: Elena Gershkovich <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for b00e284 - Browse repository at this point
Copy the full SHA b00e284View commit details -
Skip check for duplicate controller names
In newer controller-runtime version 0.19.0 the check for unique controller names. In our tests we are registering the same controller (drcluster controller) several times - in suite_test and in drcluster_mmode_test and in drcluster_drcconfig_tests. As a temporaty solution I added a flag for skipping the unique controller name validation. Another solution can be adding a name as a parameter for SetupWithManager function. Signed-off-by: Elena Gershkovich <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for e198936 - Browse repository at this point
Copy the full SHA e198936View commit details -
Fix PlacementDecision Exclusion from Hub Backup Due to Missing Label
The issue where the "velero.io/exclude-from-backup" label was not applied originates from our deployment workflow. Initially, we deploy the workload, followed by enabling DR. During the first deployment, the Placement Operator creates the "PlacementDecision" with default labels. However, when "Ramen" tries to add the "velero.io/exclude-from-backup" label during DR setup, it skips because the "PlacementDecision" already exists. Consequently, "Velero" backs up the "PlacementDecision". And during hub recovery, it is restored without its status, leading to the unintended deletion of the workload. This situation only occurs when the current state wasn't updated before hub recovery was applied. The fix in this PR does not address the scenario where the workload is deployed, a hub backup is taken, DR is enabled, and then the hub is recovered before another backup is created. Fixes bug: 2308801 Signed-off-by: Benamar Mekhissi <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 13e0c77 - Browse repository at this point
Copy the full SHA 13e0c77View commit details -
Ignore not found namespace in argocd test
Fixing this error seen in the CI: drenv.commands.Error: Command failed: command: ('addons/argocd/test', 'rdr-hub', 'rdr-dr1', 'rdr-dr2') exitcode: 1 error: Traceback (most recent call last): ... drenv.commands.Error: Command failed: command: ('kubectl', 'delete', '--context', 'rdr-dr2', 'namespace', 'argocd-test', '--wait=false') exitcode: 1 error: Error from server (NotFound): namespaces "argocd-test" not found Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for be4cd85 - Browse repository at this point
Copy the full SHA be4cd85View commit details -
'AllowVolumeExpansion' flag enables PV resizing
This commit allows PVs to be resized by enabling the 'AllowVolumeExpansion' flag in the StorageClass. When this flag is set to true, users can dynamically adjust the size of PVCs by specifying the desired capacity in the PVC specification. Signed-off-by: rakeshgm <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for a3cf1e9 - Browse repository at this point
Copy the full SHA a3cf1e9View commit details -
exclude PV and PVC in velero backups
Signed-off-by: rakeshgm <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for e907603 - Browse repository at this point
Copy the full SHA e907603View commit details -
Update setup-envtest to 0.19 version
We updated controller SDK to the 0.19 version, moving the envtest to the same version for required testing. This does not fix any test or such, it is just a hygine change. Signed-off-by: Shyamsundar Ranganathan <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 1ba302c - Browse repository at this point
Copy the full SHA 1ba302cView commit details -
Developers can enable it by copying the file from the hack directory instead of copying and pasting the code from docs/devel-quick-start.md and making the hook executable. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 49fa212 - Browse repository at this point
Copy the full SHA 49fa212View commit details -
k8s.io/component-base should have been a direct dependency. Fixes the go mod error. Signed-off-by: Raghavendra Talur <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 5bddec4 - Browse repository at this point
Copy the full SHA 5bddec4View commit details -
Update Ramen to use latest CRD from external-snapshotter
- Integrated the latest CRD from external-snapshotter. - Updated dependencies and relevant configuration files. - Changed VolumeSnapshotRefList in VolumeGroupSnapshotStatus to PVCVolumeSnapshotRefList, which allows to map a VolumeSnapshot to the application PVC. Signed-off-by: Benamar Mekhissi <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 0a467d6 - Browse repository at this point
Copy the full SHA 0a467d6View commit details -
Ensure VGS name contains at most 63 characters
Signed-off-by: Benamar Mekhissi <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 4fb500d - Browse repository at this point
Copy the full SHA 4fb500dView commit details -
Ensure the Secondary VRG is updated when DRPC is updated
This is mainly to ensure that the VRG on the primary and secondary are in sync in regards to labels and annotations Signed-off-by: Benamar Mekhissi <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 88c742a - Browse repository at this point
Copy the full SHA 88c742aView commit details -
Prepare for volsync block device test (RamenDR#1559)
* Allow unprivileged pods to access block devices We need this for testing volsync with block devices with minikube clusters. For minikube clusters this is done via the environment file which is nicer, but it requires configuring containerd after the cluster has started, which can cause failures in addons scripts. We need to upstream this change to minikube later. Signed-off-by: Nir Soffer <[email protected]> * Log replication source status in yaml It is easier to read and works better for viewing the replication logs in the status. Signed-off-by: Nir Soffer <[email protected]> * Improve volsync test teardown - Delete the replication source before unexporting the volsync service using it. - Log every every teardown step to make debugging easier. Signed-off-by: Nir Soffer <[email protected]> * Fail if deleting destination namespace get stuck Replace delete used for waiting for waiting on the deleted namespace with a timeout. If deletion get stuck, the test will fail instead of blocking forever, breaking stress test. When delete get suck we can inspect the resources in the test gather directory: % tree out/013.gather/dr2/namespaces/busybox out/013.gather/dr2/namespaces/busybox └── snapshot.storage.k8s.io └── volumesnapshots └── volsync-busybox-dst-dst-20240914203905.yaml % cat out/013.gather/dr2/namespaces/busybox/snapshot.storage.k8s.io/volumesnapshots/volsync-busybox-dst-dst-20240914203905.yaml apiVersion: snapshot.storage.k8s.io/v1 kind: VolumeSnapshot metadata: creationTimestamp: "2024-09-14T20:39:05Z" deletionGracePeriodSeconds: 0 deletionTimestamp: "2024-09-14T20:39:05Z" finalizers: - snapshot.storage.kubernetes.io/volumesnapshot-bound-protection generation: 2 This looks like an external-snapshotter bug since volsync deleted the snapshot and removed its finalizers. Signed-off-by: Nir Soffer <[email protected]> --------- Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for e1fc2e2 - Browse repository at this point
Copy the full SHA e1fc2e2View commit details -
To make it possible to debug platform and machine detection in github and or in developer environment. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 1530d0c - Browse repository at this point
Copy the full SHA 1530d0cView commit details -
All the public functions in the minikube module accept a profile, but this is actually a profile name. We want to pass a profile dict to start(). Use `name` for functions accepting a profile names. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for dcf5dba - Browse repository at this point
Copy the full SHA dcf5dbaView commit details -
Move minikube helpers to minikube module
The minikube module is mostly a thin wrapper for the minikube command, and we have higher level helpers in __main__.py. Since we wan to have multiple providers (e.g. lima, external), move all the helpers to the minikube module. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 9933744 - Browse repository at this point
Copy the full SHA 9933744View commit details -
Remove
drenv setup
from make-venvThis step is not part of creating vnev, and will be more complicated to do as part of creating the venv when adding providers. This must be run manually as we do in the CI. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 871da2d - Browse repository at this point
Copy the full SHA 871da2dView commit details -
The envfile can have now a "provider" property, defaults to "$provider", which expands to the platform default provider. The first provider is minikube. The setup and cleanup commands requires now an env file, since they need to get the provider to setup. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 4e64b18 - Browse repository at this point
Copy the full SHA 4e64b18View commit details -
Make unused minikube function private
We want to minimize the provider interface to make it easier to create new providers. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 4e00bfc - Browse repository at this point
Copy the full SHA 4e00bfcView commit details -
Move suspend and resume to minikube provider
These commands are not very portable, they work only on Linux when using minikube kvm2 driver. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 8b7f6d0 - Browse repository at this point
Copy the full SHA 8b7f6d0View commit details -
All functions used in drenv/__main__.py pass now the profile dict instead of the name. This is required for some functions like suspend and resume, since these operations are available only on certain profile driver. The load_files() function was renamed to configure(). The function also accepts the profile so the provider can configure the cluster based on the cluster configuration. This will be useful to configure containerd later. The setup_files() and cleanup_files() were renamed to setup() and cleanup(). They do not accept a profile since they are called once per provider. Functions are grouped by type: provider scope, cluster scope, and private helpers. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for a06fcf0 - Browse repository at this point
Copy the full SHA a06fcf0View commit details -
Move containerd configuration to minikube.configure()
This makes the start flow more generic, and allow every provider to do the right thing for the profile and cluster status. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 5e1bb80 - Browse repository at this point
Copy the full SHA 5e1bb80View commit details -
Move waiting for fresh status to minikube
This is a specific minikube workaround - when starting an existing cluster, kubernetes reports stale state for a while. The wait is not needed for external cluster which we never restart. If this will be needed for other provider we can extract a common helper later. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for b2e2615 - Browse repository at this point
Copy the full SHA b2e2615View commit details -
Some commands (like drenv) log to stderr without writing anything to stdout. When we watch such commands we want to watch stderr instead of stdout. Since the command do not write anything to stdout, we can redirect the command stderr to stdout. When a command fails, we cannot report the error message since it was already yielded to the code watching the command. This is the issue with logging everything to stderr, but we don't control the commands we run. This change add an option to redirect stderr to commands.watch() and and test the behavior. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for fa145f6 - Browse repository at this point
Copy the full SHA fa145f6View commit details -
Like limactl, drenv logs only to stderr, so when running it in tests with: commands.run("drenv", "start", ...) we don't see anything in the test logs. If the test is blocking for long time, we have no way to debug this. The helpful log lines are buffered in the command error buffer. Use the new stderr= argument to watch and log the command output, and add helpers for drenv in a consistent way. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 40e9b01 - Browse repository at this point
Copy the full SHA 40e9b01View commit details -
Use /readyz endpoint for ready check
We used `kubectl version` as a proxy for cluster readynes, checking for server info in the response. Replace the check with lower level check if API server /readyz endpoint. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 005a0bc - Browse repository at this point
Copy the full SHA 005a0bcView commit details -
Before this change we had only one provider, so this was internal implementation detail. This change adds the second provider, allowing users to configure the provider in the environment file. Replace the `external: true` option with `provider: external`. With this we can remove the special handling or external cluster with calls to the external provider which does the right thing. The external provider basically does nothing, since we do not manage this cluster. However in start() we ensure that the cluster exists and then wait until the cluster is ready. This helps to debug issues with external cluster and reduces log noise. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 40fe77c - Browse repository at this point
Copy the full SHA 40fe77cView commit details -
Implement DRClusterConfig reconciler to create required ClusterClaims (…
…RamenDR#1485) * Add logger to DRClusterConfig reconciler Also, cleanup some scaffolding comments. Signed-off-by: Shyamsundar Ranganathan <[email protected]> * Add initial reconcile for DRClusterConfig - Add finalizer to resource being reconciled - Remove on delete - Update reconciler to rate limit max exponential backoff to 5 minutes Signed-off-by: Shyamsundar Ranganathan <[email protected]> * Add roles for various storage classes and cluster claims Signed-off-by: Shyamsundar Ranganathan <[email protected]> * Add StorageClass listing and dummy functions for claim creation Building the scaffold for the overall functionality. Signed-off-by: Shyamsundar Ranganathan <[email protected]> * Add ClusterClaims for detected StorageClasses Signed-off-by: Shyamsundar Ranganathan <[email protected]> * Implement pruning of ClusterClaims For classes listed, those that do not need a ClusterClaim any longer are deleted. Added a StorageClass watcher as well to the reconcile on changes to StorageClasses. Signed-off-by: Shyamsundar Ranganathan <[email protected]> * Implement CLassClaims for VRClass and VSClass Signed-off-by: Shyamsundar Ranganathan <[email protected]> --------- Signed-off-by: Shyamsundar Ranganathan <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 7aa8ae4 - Browse repository at this point
Copy the full SHA 7aa8ae4View commit details -
Working with lima yaml revealed an issue with pyyaml, not preserving multiline strings when reading and writing yaml. This breaks blocks using the "|" or "|>" yaml format. Add a wrapper module configuring pyyaml to preserve multiline strings. The module provide the minimal interface we use. The module also enforce good defaults: - Use only safe load functions - Don't sort keys to preserve the original object order All existing users are using the new wrapper now. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 18e3c21 - Browse repository at this point
Copy the full SHA 18e3c21View commit details -
The module provides: - merge(): merge one or more source kubeconfigs into a target kubeconfig. - remove(): remove context, cluster, and user matching a name. We will use this to import lima clusters kubeconfigs after clusters are started, and remove the kubeconfigs when clusters are stopped or deleted, matching minikube behavior. To use kubectl for merging configs, added an env argument for kubectl.config(). This is required since since the --kubeconfig parameter access only single kubeconfig. The module use a lockfile compatible with `kubectl config` or other programs using the same client-go library. This ensures that concurrent use of `kubctl config` and multiple drenv threads will not conflict when modifying the default kubeconfig. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 41d0c76 - Browse repository at this point
Copy the full SHA 41d0c76View commit details -
Update kubectl-gather to 0.5.1
This release improves errors handling, build, and testing. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for f70cee5 - Browse repository at this point
Copy the full SHA f70cee5View commit details -
On macOS setup is much simpler since most of the tools are available via the brew package manager. For creating vms we use the lima project. We need to build upstream version to consume important fixes and enhancements. We would be able to use next lima release around October. For shared network we use socket_vmnet. It should be installed as root and the easiest way to do this is building from source. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for fed1215 - Browse repository at this point
Copy the full SHA fed1215View commit details -
Add lima provider for Apple silicon
We use the "vz" VM type, using Apple virtualization framework, providing very good performance. A critical feature is Rosetta 2 integration, supporting running amd64 container images if an arm64 image is not available. This is required to run OCM since it does not provide arm64 images for all components. Lima provisioning is very easy to customize, based on cloud-init and yaml configuration, so unlike Minikube, we can easily configure everything we need when creating a cluster. However, provisioning is very slow since it does not support preloading content like Minikube. We can improve this later by creating our own template disk images with pre-installed and configured Kubernetes. For networking we use shared network via socket_vmnet. This is user based network providing host to VM and VM to VM access. Performance is poor (1 Gbps, 30 times slower than native networking) but it should be fast enough for our use case. Lima is the default provider for Apple silicon machines. On Linux minikube is faster and integrated better. The provider configuration is derived from lima-vm k8s.yaml example, using only the stable arm64 image and reformatted to fix yamllint warnings. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 6f52e8f - Browse repository at this point
Copy the full SHA 6f52e8fView commit details -
Add development environment for lima
This is a copy of regional-dr.yaml disabling components that do not work yet. When we fix all the issues, we can delete this environment and use the standard one. - rook-ceph, requires kubernetes-csi/external-snapshotter - submariner: need to label nodes - volsync: needs submarinner and cephfs - argocd: argocd tool fails with PEM error Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for ee98edb - Browse repository at this point
Copy the full SHA ee98edbView commit details -
Add a delay after starting a stopped cluster
Similar to minikube, starting a stopped cluster is more flaky. Even when k8s reports that everything is ready, some components are not ready and running the start hooks can fail randomly. Example failure: Error from server (InternalError): Internal error occurred: failed calling webhook "managedclustermutators.admission.cluster.open-cluster-management.io": failed to call webhook: Post "https://cluster-manager-registration-webhook.open-cluster-management-hub.svc:9443/ mutate-cluster-open-cluster-management-io-v1-managedcluster?timeout=10s": dial tcp 10.110.203.24:9443: connect: no route to host Try to avoid this by adding a short delay after starting a stopped cluster, before we start to run the hooks. This change affects only developers that stop the environment and start it again. In minikube we added delay in configure(), but for lima is better done in start(), since there we can tell if this is a start of a stopped cluster. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 756c2fb - Browse repository at this point
Copy the full SHA 756c2fbView commit details -
limactl logs everything to stderr, and we watch stderr to consume the logs. Since we drop limactl logs when running in normal log level, when limactl fail we don't have any info on the error, and the only way to debug limactl error is run with verbose mode. With this change we extract limactl log level and log errors as drenv errors, so the last log before the error provide some info on the error. Tested by uninstalling socket_vmnet: sudo make uninstall.launchd With this limaclt fails to connect to the vmnet socket: % drenv start envs/vm.yaml 2024-09-09 22:30:07,354 INFO [vm] Starting environment 2024-09-09 22:30:07,376 INFO [cluster] Starting lima cluster 2024-09-09 22:30:26,490 ERROR [cluster] exiting, status={Running:false Degraded:false Exiting:true Errors:[] SSHLocalPort:0} (hint: see "/Users/nsoffer/.lima/cluster/ha.stderr.log") 2024-09-09 22:30:26,492 ERROR Command failed Traceback (most recent call last): ... drenv.commands.Error: Command failed: command: ('limactl', '--log-format=json', 'start', 'cluster') exitcode: 1 error: The lima error message is not very useful but this is what we have. This should be improved in lima. If we inspect the log file mentioned we can see the actual error: % tail -3 ~/.lima/cluster/ha.stderr.log {"level":"debug","msg":"Start tcp DNS listening on: 127.0.0.1:51618","time":"2024-09-09T22:30:26+03:00"} {"level":"info","msg":"new connection from to ","time":"2024-09-09T22:30:26+03:00"} {"level":"fatal","msg":"dial unix /var/run/socket_vmnet: connect: connection refused","time":"2024-09-09T22:30:26+03:00"} Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 61a078f - Browse repository at this point
Copy the full SHA 61a078fView commit details -
Log invalid json logs in debug mode
Usually this is an early usage error, written before the logger is configured to json format, so we get a text log instead o message. Log this line as is in debug level to allow debugging the issue. Example error when using older lima version not supporting --log-format: 2024-09-12 22:25:55,637 DEBUG [drenv-test-cluster] time="2024-09-12T22:25:55Z" level=fatal msg="unknown flag: --log-format" Without this change, this error is dropped and we don't have a clue what went wrong. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 67e6cbc - Browse repository at this point
Copy the full SHA 67e6cbcView commit details -
We access the cluster via the IP address on the shared network. Port forwarding cannot work for multiple clusters since same port from multiple clusters is mapped to the same host port. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 9002195 - Browse repository at this point
Copy the full SHA 9002195View commit details -
Change API server to use the shared network
Without this the API server will listen on the user network which is not accessible from the host. Lima try to mitigate this by changing the address to 127.0.0.1, but this does not work for multiple clusters. With this change we can access all clusters from the host. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 940d012 - Browse repository at this point
Copy the full SHA 940d012View commit details -
Configure kubelet to use the right IP address
Without this configuration the rook-ceph pods are listening on the user network (192.168.5.0/24) instead of the shared network (192.168.105.0/24), and rbd-mirror is broken. With this change we can run the rook environment. Thanks: Raghavendra Talur <[email protected]> Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 20d5e90 - Browse repository at this point
Copy the full SHA 20d5e90View commit details -
Configure kubelet to pull images in parallel
Previously configured as minikube --extra-config. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 89c2ddd - Browse repository at this point
Copy the full SHA 89c2dddView commit details -
Configure kubelet feature gates
With minikube this is set in the profile, and configured via --feature-gates flag. With lima we can configure this directly in KubeletConfiguration. Currently the feature gates are hard coded in the configuration for all cluster. We can configure based on the profile later if needed. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 5d0e348 - Browse repository at this point
Copy the full SHA 5d0e348View commit details -
Currently we have: $ sysctl fs.inotify fs.inotify.max_queued_events = 16384 fs.inotify.max_user_instances = 128 fs.inotify.max_user_watches = 45827 And we see errors like this on managed clusters even with trivial busybox workloads: failed to create fsnotify watcher: too many open files We use OpenShift worker defaults, already used for minikube[1]. [1] kubernetes/minikube#18832 Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 201b273 - Browse repository at this point
Copy the full SHA 201b273View commit details -
Update minio to latest release
We used 6 month old release, time to upgrade. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for e63bcb3 - Browse repository at this point
Copy the full SHA e63bcb3View commit details -
Use hostpath storage for minio
This makes it work in lima cluster without deploying a csi-hostpath driver. We can add such driver later if there is a real need. With this change we can run the minio environment. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 2a39429 - Browse repository at this point
Copy the full SHA 2a39429View commit details -
Support commands reading from stdin
This allows using commands.run() and commands.watch() with an open file connected to the child process stdin. We will use this to load images into lima cluster. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 6035794 - Browse repository at this point
Copy the full SHA 6035794View commit details -
Support command work directory
Some commands like drenv must run in specific location. Add cwd argument allowing this when using commands.run() and commands.watch(). We will use this to run `drenv load` in `ramenctl deploy`, which may run in any directory. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 44584e5 - Browse repository at this point
Copy the full SHA 44584e5View commit details -
This command loads an image in tar format into all clusters. This will be used in ramenctl to load images into the clusters, and can also be used manually. The environment may use one or more provider, and each one will use the right command to load the image. External provider do not support loading images. Pushing the ramen image to a registry will work. Usage: % drenv load -h usage: drenv load [-h] [-v] [--name-prefix PREFIX] --image IMAGE filename positional arguments: filename path to environment file options: -h, --help show this help message and exit -v, --verbose be more verbose --name-prefix PREFIX prefix profile names --image IMAGE image to load into the cluster in tar format Example run: % drenv load --image /tmp/image.tar envs/regional-dr.yaml 2024-09-10 22:33:30,896 INFO [rdr] Loading image '/tmp/image.tar' 2024-09-10 22:33:30,902 INFO [dr1] Loading image 2024-09-10 22:33:30,902 INFO [dr2] Loading image 2024-09-10 22:33:30,902 INFO [hub] Loading image 2024-09-10 22:33:33,314 INFO [dr1] Image loaded in 2.41 seconds 2024-09-10 22:33:33,407 INFO [dr2] Image loaded in 2.50 seconds 2024-09-10 22:33:33,628 INFO [hub] Image loaded in 2.73 seconds 2024-09-10 22:33:33,628 INFO [rdr] Image loaded in 2.73 seconds Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for af53a2a - Browse repository at this point
Copy the full SHA af53a2aView commit details -
With this ramenctl can deploy ramen on any cluster type without knowing anything about the cluster provider. Example run: % ramenctl deploy --source-dir .. envs/regional-dr-lima.yaml 2024-09-09 00:52:14,231 INFO [ramenctl] Starting deploy 2024-09-09 00:52:14,234 INFO [ramenctl] Preparing resources 2024-09-09 00:52:18,192 INFO [ramenctl] Loading image 'quay.io/ramendr/ramen-operator:latest' 2024-09-09 00:52:22,023 INFO [ramenctl] Deploying ramen operator in cluster 'hub' 2024-09-09 00:52:22,023 INFO [ramenctl] Deploying ramen operator in cluster 'dr1' 2024-09-09 00:52:22,025 INFO [ramenctl] Deploying ramen operator in cluster 'dr2' 2024-09-09 00:52:22,600 INFO [ramenctl] Waiting until 'ramen-hub-operator' is rolled out in cluster 'hub' 2024-09-09 00:52:22,687 INFO [ramenctl] Waiting until 'ramen-dr-cluster-operator' is rolled out in cluster 'dr1' 2024-09-09 00:52:22,697 INFO [ramenctl] Waiting until 'ramen-dr-cluster-operator' is rolled out in cluster 'dr2' 2024-09-09 00:52:29,893 INFO [ramenctl] Finished deploy in 15.65 seconds Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 8924dc7 - Browse repository at this point
Copy the full SHA 8924dc7View commit details -
Disable broker certificate check on macOS
There may be a better way, but for testing setup we could not care less about certificates checks. We can try to improve this later if we think that drenv will be used on real clusters. Thanks: Raghavendra Talur <[email protected]> Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for c4f5f54 - Browse repository at this point
Copy the full SHA c4f5f54View commit details -
Annotate nodes with submariner public ip
On lima cluster submariner use the public IP of the host (the address assigned by your ISP) as the public IP of the clusters, and all clusters get the same IP: % subctl show connections --context dr1 ✓ Showing Connections GATEWAY CLUSTER REMOTE IP NAT CABLE DRIVER SUBNETS STATUS RTT avg. lima-dr2 dr2 93.172.220.134 yes vxlan 242.1.0.0/16 connected % subctl show endpoints --context dr1 ✓ Showing Endpoints CLUSTER ENDPOINT IP PUBLIC IP CABLE DRIVER TYPE dr1 192.168.5.15 93.172.220.134 vxlan local dr2 192.168.5.15 93.172.220.134 vxlan remote With this change it uses the actual IP address of the cluster in the vmnet network: % subctl show connections --context dr1 ✓ Showing Connections GATEWAY CLUSTER REMOTE IP NAT CABLE DRIVER SUBNETS STATUS RTT avg. lima-dr2 dr2 192.168.105.10 yes vxlan 242.1.0.0/16 connected % subctl show endpoints --context dr1 ✓ Showing Endpoints CLUSTER ENDPOINT IP PUBLIC IP CABLE DRIVER TYPE dr1 192.168.5.15 192.168.105.11 vxlan local dr2 192.168.5.15 192.168.105.10 vxlan remote Thanks: Raghavendra Talur <[email protected]> Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for b12d40e - Browse repository at this point
Copy the full SHA b12d40eView commit details -
Promote vmnet shared network route
After provisioning a lima vm we have 2 default routes: % limactl shell dr1 ip route show default default via 192.168.5.2 dev eth0 proto dhcp src 192.168.5.15 metric 100 default via 192.168.105.1 dev lima0 proto dhcp src 192.168.105.11 metric 100 192.168.5.0/24 is the special user network used by lima to bootstrap the VM. All vms use have the same IP address (192.168.5.15) so this network cannot be used to access the vm from the host. 192.168.105.0/24 is the vmnet shared network, providing access from host to vm and from vm to vm. We wan to use only this network. Without this change submariner uses the special user network (192.168.5.0/24) for the endpoints, which cannot work for accessing the other clusters: % subctl show connections --context dr1 ✓ Showing Connections GATEWAY CLUSTER REMOTE IP NAT CABLE DRIVER SUBNETS STATUS RTT avg. lima-dr2 dr2 192.168.105.10 yes vxlan 242.1.0.0/16 connected % subctl show endpoints --context dr1 ✓ Showing Endpoints CLUSTER ENDPOINT IP PUBLIC IP CABLE DRIVER TYPE dr1 192.168.5.15 192.168.105.11 vxlan local dr2 192.168.5.15 192.168.105.10 vxlan remote I tried to fix this issue by deleting the default route via 192.168.5.2. This works for deploying submariner, but this route is recreated later, and this breaks submariner gateway and connectivity between the clusters. Changing the order of the default routes seem to work, both for deploying submariner, and for running tests on the running clusters. We do this by modifying the metric of the preferred route so it becomes first: % limactl shell dr1 ip route show default default via 192.168.105.1 dev lima0 proto dhcp src 192.168.105.11 metric 1 default via 192.168.5.2 dev eth0 proto dhcp src 192.168.5.15 metric 100 With this change the endpoint listen on the public ip (in the vmnet network), allowing access to other clusters: % subctl show connections --context dr1 ✓ Showing Connections GATEWAY CLUSTER REMOTE IP NAT CABLE DRIVER SUBNETS STATUS RTT avg. lima-dr2 dr2 192.168.105.10 no vxlan 242.1.0.0/16 connected % subctl show endpoints --context dr1 ✓ Showing Endpoints CLUSTER ENDPOINT IP PUBLIC IP CABLE DRIVER TYPE dr1 192.168.105.11 192.168.105.11 vxlan local dr2 192.168.105.10 192.168.105.10 vxlan remote Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 5246b03 - Browse repository at this point
Copy the full SHA 5246b03View commit details -
With 0.17.0 and 0.17.2 the globalnet pod fails with: 2024-09-08T15:42:25.498Z FTL ../gateway_monitor.go:286 Globalnet Error starting the controllers error="error creating the Node controller: error retrieving local Node \"lima-dr1\": nodes \"lima-dr1\" is forbidden: User \"system:serviceaccount:submariner-operator:submariner-globalnet\" cannot get resource \"nodes\" in API group \"\" at the cluster scope" This worked with minikube clusters, so maybe this related to the some difference in the way the cluster is deployed, but we want to upgrade to latest submariner anyway to detect regressions early. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 28a5e87 - Browse repository at this point
Copy the full SHA 28a5e87View commit details -
Wait for all clusters before deploying submariner
When testing the small submariner environment, we may start deploying one cluster before the other cluster is ready. This fail randomly with lima clusters when submariner use the wrong interface. This may happen if we install submariner before flannel is ready. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for d55abec - Browse repository at this point
Copy the full SHA d55abecView commit details -
Remove the nslookup step, since is problematic: - nslookuop and curl use different DNS resolvers, so when nslookup succeeds it does not mean that curl will succeed. - nslookup sometimes return zero exit code with a message that the lookup failed! Then we try to access the DNS name with curl with a short timeout (60 seconds) and fail. Simply to check only with curl, increasing the timeout to 300 seconds. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for bb9fa7a - Browse repository at this point
Copy the full SHA bb9fa7aView commit details -
Fix submariner test container to keep running
It was configured to exit after 300 seconds, which makes it hard to test when it takes lot of time to wait for connectivity. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for ee7617c - Browse repository at this point
Copy the full SHA ee7617cView commit details -
Submariner works now so we can enable in the regional-dr-lima.yaml. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 88da933 - Browse repository at this point
Copy the full SHA 88da933View commit details -
Add external-snapshotter addon
This addons replaces the minikube volumesnapshot addon, and is needed for cephfs, volsync, and for testing volume replication of snapshot and pvcs created from snapshots. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 51d331a - Browse repository at this point
Copy the full SHA 51d331aView commit details -
Replace volumesnapshot with external-snapshotter
This addon works on both minikube and lima clusters. It is used by the cephfs and volsync and will be used for testing DR for workloads using rbd pvc restored from snapshot. To use snapshot with rbd storage a snapshot class was added based on rook 1.15 example. With this change we can enable cephfs and volsync in regional-dr-lima.yaml. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 5187293 - Browse repository at this point
Copy the full SHA 5187293View commit details -
Fix argocd deployment on macOS
We did not flatten the config since it is not needed in minikube, using path to the certificate. But in lima we get the actual certificate from the guest, and without flattening we get: clusters: - name: drenv-test-cluster cluster: server: https://192.168.105.45:6443 certificate-authority-data: DATA+OMITTED users: - name: drenv-test-cluster user: client-certificate-data: DATA+OMITTED client-key-data: DATA+OMITTED ... `DATA-OMITTED` is not a valid certificate, so argocd fail to parse it. With this change argocd works, and we can use regional-dr.yaml on macOS. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 9ac2f37 - Browse repository at this point
Copy the full SHA 9ac2f37View commit details -
Avoid random failures when deleting environment
Limactl is racy, trying to access files in other clusters directories and failing when files were deleted. Until this issue is fixed in lima, ensure that only single vm can be deleted at the same time. Example failure: % drenv delete envs/regional-dr.yaml 2024-09-13 05:59:57,159 INFO [rdr] Deleting environment 2024-09-13 05:59:57,169 INFO [dr1] Deleting lima cluster 2024-09-13 05:59:57,169 INFO [dr2] Deleting lima cluster 2024-09-13 05:59:57,169 INFO [hub] Deleting lima cluster 2024-09-13 05:59:57,255 WARNING [dr2] no such process 2024-09-13 05:59:57,265 WARNING [dr2] remove /Users/nsoffer/.lima/dr2/ssh.sock: no such file or directory 2024-09-13 05:59:57,265 WARNING [hub] remove /Users/nsoffer/.lima/hub/ssh.sock: no such file or directory 2024-09-13 05:59:57,297 ERROR [dr1] open /Users/nsoffer/.lima/dr2/lima.yaml: no such file or directory 2024-09-13 05:59:57,297 ERROR [hub] open /Users/nsoffer/.lima/dr2/lima.yaml: no such file or directory 2024-09-13 05:59:57,298 ERROR Command failed Traceback (most recent call last): ... drenv.commands.Error: Command failed: command: ('limactl', '--log-format=json', 'delete', '--force', 'dr1') exitcode: 1 error: Note how delete command for "dr1" and "hub" are failing to read lima.yaml of cluster "dr2": 2024-09-13 05:59:57,297 ERROR [dr1] open /Users/nsoffer/.lima/dr2/lima.yaml: no such file or directory 2024-09-13 05:59:57,297 ERROR [hub] open /Users/nsoffer/.lima/dr2/lima.yaml: no such file or directory With the lock, we run single limactl process at a time, so it cannot race with other clusters. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 145065b - Browse repository at this point
Copy the full SHA 145065bView commit details -
Ensure deleted DRPolicies do not add schedules to DRClusterConfig (Ra…
…menDR#1565) * Ensure deleted DRPolicies do not add schedules to DRClusterConfig Signed-off-by: Shyamsundar Ranganathan <[email protected]> * Correct external snapshotter dependency to 0.8 Was introduced as part of the commit 91e5a5b Signed-off-by: Shyamsundar Ranganathan <[email protected]> * Fetch external-snapshotter resources using raw URL Older scheme was using a git clone URL, this is more efficient as it fetches required resources direcly and avoids the clone. Ref.: https://github.com/kubernetes-sigs/kustomize/blob/master/examples/remoteBuild.md#remote-directories Signed-off-by: Shyamsundar Ranganathan <[email protected]> --------- Signed-off-by: Shyamsundar Ranganathan <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 634d247 - Browse repository at this point
Copy the full SHA 634d247View commit details -
Update csi-addons to v0.10.0 in drenv
This brings the new Validated condition needed for fixing disable dr when a VR precondition has failed. With this change we can use the new feature in ramen for testing the fix. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 35e5d26 - Browse repository at this point
Copy the full SHA 35e5d26View commit details -
Update csi-addons requirement to v.0.10.0 in ramen
Done using: % go get github.com/csi-addons/[email protected] go: upgraded github.com/csi-addons/kubernetes-csi-addons v0.9.1 => v0.10.0 go: upgraded github.com/google/pprof v0.0.0-20240727154555-813a5fbdbec8 => v0.0.0-20240827171923-fa2c70bbbfe5 go: upgraded github.com/onsi/ginkgo/v2 v2.20.0 => v2.20.2 go: upgraded github.com/onsi/gomega v1.34.1 => v1.34.2 go: upgraded golang.org/x/sys v0.23.0 => v0.24.0 go: upgraded k8s.io/api v0.31.0 => v0.31.1 go: upgraded k8s.io/apimachinery v0.31.0 => v0.31.1 % go mod tidy Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 0a895a5 - Browse repository at this point
Copy the full SHA 0a895a5View commit details -
Adjust drenv to support consistency groups
Signed-off-by: Elena Gershkovich <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for cad86be - Browse repository at this point
Copy the full SHA cad86beView commit details -
ci: use lower version of z as hard dependency
use lower version as the hard dependency and use required hard dependency in toolchain as not all the system will have the required golang z version installed Signed-off-by: Madhu Rajanna <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 44c1776 - Browse repository at this point
Copy the full SHA 44c1776View commit details -
go.mod: update to newer version of recipe api
Command run: `go get -u github.com/ramendr/recipe` Signed-off-by: Raghavendra Talur <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 948ff76 - Browse repository at this point
Copy the full SHA 948ff76View commit details -
test: add the updated recipe crd to the test dir
Signed-off-by: Raghavendra Talur <[email protected]> Co-Authored-by: Annaraya Narasagond <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 46d441c - Browse repository at this point
Copy the full SHA 46d441cView commit details -
controller: update implementation to match with recipe CRD
Changes are: 1. Find the backup and restore workflow by name as CaptureWorkflow and RecoverWorkflow don't exist anymore. 2. Change timeout to int 3. Change command to string type Signed-off-by: Raghavendra Talur <[email protected]> Co-Authored-by: Annaraya Narasagond <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 6fda9b2 - Browse repository at this point
Copy the full SHA 6fda9b2View commit details -
go.mod: update to latest version of the ramen api
Command run: `go get -u github.com/ramendr/ramen/api/v1alpha1` Signed-off-by: Raghavendra Talur <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 88168ee - Browse repository at this point
Copy the full SHA 88168eeView commit details -
Add ClusterClaim to the ramen-dr-cluster reconciler scheme
This is required as the dr-cluster DRClusterConfig reconciler manages cluster claims. This was missed out in commit 91e5a5b Signed-off-by: Shyamsundar Ranganathan <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 0ee14df - Browse repository at this point
Copy the full SHA 0ee14dfView commit details -
Kustomize rbd-mirror directory.
Signed-off-by: Elena Gershkovich <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 24e0133 - Browse repository at this point
Copy the full SHA 24e0133View commit details -
Add flag for enabling or disabling consistency groups
We want to be disable or enable consistency groups support for testing with drenv. The enabling/disabling flag can be added in envs files: ramen: hub: hub clusters: [dr1, dr2] topology: regional-dr features: volsync: true consistency_groups: true When flag is not present, consistency groups are disabled by default Signed-off-by: Elena Gershkovich <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 0134dea - Browse repository at this point
Copy the full SHA 0134deaView commit details -
Refactor: Extract resource watching logic from drpc controller into d…
…rpc watcher - Moved all resource-watching related functions from drplacementcontrol_controller.go to a new file drplacementcontrol_watcher.go. - No functional changes introduced, purely a structural refactor. Signed-off-by: Benamar Mekhissi <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 627ccc8 - Browse repository at this point
Copy the full SHA 627ccc8View commit details -
Watch for DRPolicy resource changes
- Added functionality to watch for changes in DRPolicy resources to trigger DRPC reconciliation when necessary. Signed-off-by: Benamar Mekhissi <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 1204662 - Browse repository at this point
Copy the full SHA 1204662View commit details -
Add external-snapshotter addon to kubevirt envs
Was forgetten when we replaced minikube volumesnapshot addons. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for d17d112 - Browse repository at this point
Copy the full SHA d17d112View commit details -
Reformat reconcileMissingVR comment for readability
The comment is still unclear, but at least easier to read. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for fcdddfd - Browse repository at this point
Copy the full SHA fcdddfdView commit details -
Fix logs in reconcileMissingVR
- Replace "VR under deletion" with "deleted VR", matching the terminology used in the code. - Replace "detected as missing or deleted" with "is missing" when we know the VR does not exist. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for fb1d4fb - Browse repository at this point
Copy the full SHA fb1d4fbView commit details -
Extract VRGInstance.validateVRCompletedStatus()
Moved from validateVRStatus() to make room from checking VR VAlidated status. This also make the function easier to understand, keeping the same level of abstraction and getting rid of the uninteresting details. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 5fa81f5 - Browse repository at this point
Copy the full SHA 5fa81f5View commit details -
Rename msg to errorMsg and document that this is an error message, if the we could not get the condition value, because it is missing, stale, or unknown. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 99093c5 - Browse repository at this point
Copy the full SHA 99093c5View commit details -
Improve logging in delete VR flow
Log after important changes to the system in delete VR flow to make it easier to understand what the system is doing, and how ramen changed the system. New logs: - delete the VR resource - remove annotations from PV Improve logs: - remove annotations, labels, and finalizers from PVC Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 5661d38 - Browse repository at this point
Copy the full SHA 5661d38View commit details -
Fix disable dr if VR failed validation
When deleting a primary VRG, we wait until the VR Completed condition is met. However if a VR precondition failed, for example using a drpolicy without flattening enabled when the PVC needs flattening, the VR will never complete and the vrg and drpc deletion will never complete. Since csi-addons 0.10.0 we have a new Validated VR condition, set to true if pre conditions are met, and false if not. VR is can be deleted safely in this state, since mirroring was not enabled. This changes modifies deleted VRG processing to check the new VR Validated status. If the condition exist and the condition status is false, validateVRStatus() return true, signaling that the VR is in the desired state, and ramen completes the delete flow. If the VR does not report the Validated condition (e.g. old csi-addon version) or the condition status is true (mirroring in progress), we continue in the normal flow. The VR will be deleted only when the Completed condition status is true. Tested with discovered deployment and vm using a pvc created from a volume snapshot. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for a4d0b54 - Browse repository at this point
Copy the full SHA a4d0b54View commit details -
Updating broken clusteradm link in the doc
Signed-off-by: Abhijeet Shakya <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 38654c6 - Browse repository at this point
Copy the full SHA 38654c6View commit details -
Add daily e2e job for refreshing the cache
To refresh the cache we need to checkout ramen source and run drenv cache with the environment files. Using a workflow for this make this job easy to implement and manage without accessing the runner directly. The job can also run manually from github UI. This is likely to work for people with write access. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 8a70425 - Browse repository at this point
Copy the full SHA 8a70425View commit details -
drenv: add cluster name to storage id
Signed-off-by: Elena Gershkovich <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 86f5d17 - Browse repository at this point
Copy the full SHA 86f5d17View commit details -
Fix intermittent VolSync unit test failure
Resolved an issue causing sporadic failures in the Ramen/VolSync related unit test due to delay creation of the PVC resource. Signed-off-by: Benamar Mekhissi <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 1c04e61 - Browse repository at this point
Copy the full SHA 1c04e61View commit details -
Quote addresses to avoid yaml parsing surprises
Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 7e88713 - Browse repository at this point
Copy the full SHA 7e88713View commit details -
Add drenv start --timeout option
This is useful when you want to avoid drenv start getting stuck, and the environment does not provide a way to time out. Example run: % drenv start --timeout 60 envs/vm.yaml 2024-09-21 04:27:43,555 INFO [vm] Starting environment 2024-09-21 04:27:43,581 INFO [cluster] Starting lima cluster 2024-09-21 04:28:43,785 ERROR [cluster] did not receive an event with the "running" status 2024-09-21 04:28:43,790 ERROR Command failed Traceback (most recent call last): ... drenv.commands.Error: Command failed: command: ('limactl', '--log-format=json', 'start', '--timeout=60s', 'cluster') exitcode: 1 error: For lima provider, we pass the timeout to limactl. For minikube we use commands.watch() timeout. External provider does not use the timeout since we don't start the cluster. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 5482b46 - Browse repository at this point
Copy the full SHA 5482b46View commit details -
Make the vm environment smaller
On github actions we cannot start a vm with more than one cpu, and our test cluster is very small so it should work with 1g of ram. This makes the test cluster consume less resources if a developer leave it running for long time. Using 1 cpu conflicts with kubeadm preflight checks: [init] Using Kubernetes version: v1.31.0 ... error execution phase preflight: [preflight] Some fatal errors occurred: [ERROR NumCPU]: the number of available CPUs 1 is less than the required 2 [preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...` Since we know we are doing, we suppress this preflight error. Running top shows that the cluster consumes only 9% cpu and 33.5% of the available memory. top - 04:51:02 up 10 min, 1 user, load average: 0.60, 0.29, 0.16 Tasks: 123 total, 1 running, 122 sleeping, 0 stopped, 0 zombie %Cpu(s): 6.2/3.1 9[|||||| ] MiB Mem : 33.5/1959.9 [||||||||||||||||||||| ] MiB Swap: 0.0/0.0 [ ] PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4124 root 20 0 1449480 244352 68224 S 2.3 12.2 0:18.47 kube-apiserver 4120 root 20 0 1309260 102456 60800 S 0.7 5.1 0:06.17 kube-controller 4273 root 20 0 1902692 92040 59904 S 2.0 4.6 0:12.47 kubelet 4159 root 20 0 1288880 60156 44288 S 0.3 3.0 0:02.20 kube-scheduler 3718 root 20 0 1267660 57880 31020 S 0.3 2.9 0:09.28 containerd 4453 root 20 0 1289120 53540 42880 S 0.0 2.7 0:00.15 kube-proxy 5100 65532 20 0 1284608 49656 37504 S 0.0 2.5 0:01.14 coredns 5000 65532 20 0 1284608 49528 37376 S 0.0 2.5 0:00.86 coredns 4166 root 20 0 11.2g 47360 21504 S 1.3 2.4 0:08.69 etcd We could make the cluster even smaller, but kubeadm preflight check requires 1700 MiB, and we don't have memory issue in github or on developers machines. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for eede468 - Browse repository at this point
Copy the full SHA eede468View commit details -
Disable rosetta for vm environment
We need rosetta only for complex setup when a component is not providing arm64 images. For testing drenv we have an empty cluster with busybox, and out busybox image is multi-arch. Disabling rosetta may speed up the the cluster, and this may be significant on github actions since the runners are about 6.5 times slower compared to M1 mac. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 7043af7 - Browse repository at this point
Copy the full SHA 7043af7View commit details -
Use lima also on darwin/x86_64
This is useful for running the tests on github runner, and also support one option for macOS. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 5f9c5bb - Browse repository at this point
Copy the full SHA 5f9c5bbView commit details -
Scale down coredns to 1 replica
Same optimization used in minikube. Minikube use 2 replicas only when using HA configuration. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 77dc276 - Browse repository at this point
Copy the full SHA 77dc276View commit details -
Don't wait for coredns deployment
It takes about 30 seconds until coredns is ready but we don't depend on it in the current code. Removing this wait we can start deploying 30 seconds earlier. This reduce the time for starting a cluster from 155 seconds to 125 seconds, and regional-dr environment from 450 to 420 seconds. Example run with vm environment: % drenv start envs/vm.yaml 2024-09-22 18:39:55,726 INFO [vm] Starting environment 2024-09-22 18:39:55,743 INFO [cluster] Starting lima cluster 2024-09-22 18:41:57,818 INFO [cluster] Cluster started in 122.07 seconds 2024-09-22 18:41:57,819 INFO [cluster/0] Running addons/example/start 2024-09-22 18:42:18,966 INFO [cluster/0] addons/example/start completed in 21.15 seconds 2024-09-22 18:42:18,966 INFO [cluster/0] Running addons/example/test 2024-09-22 18:42:19,120 INFO [cluster/0] addons/example/test completed in 0.15 seconds 2024-09-22 18:42:19,121 INFO [vm] Environment started in 143.40 seconds % drenv stop envs/vm.yaml 2024-09-22 18:42:44,244 INFO [vm] Stopping environment 2024-09-22 18:42:44,317 INFO [cluster] Stopping lima cluster 2024-09-22 18:42:44,578 WARNING [cluster] [hostagent] dhcp: unhandled message type: RELEASE 2024-09-22 18:42:49,441 INFO [cluster] Cluster stopped in 5.13 seconds 2024-09-22 18:42:49,441 INFO [vm] Environment stopped in 5.20 seconds % drenv start envs/vm.yaml 2024-09-22 18:42:53,132 INFO [vm] Starting environment 2024-09-22 18:42:53,156 INFO [cluster] Starting lima cluster 2024-09-22 18:43:34,436 INFO [cluster] Cluster started in 41.28 seconds 2024-09-22 18:43:34,437 INFO [cluster] Looking up failed deployments 2024-09-22 18:43:34,842 INFO [cluster/0] Running addons/example/start 2024-09-22 18:43:35,208 INFO [cluster/0] addons/example/start completed in 0.37 seconds 2024-09-22 18:43:35,208 INFO [cluster/0] Running addons/example/test 2024-09-22 18:43:35,371 INFO [cluster/0] addons/example/test completed in 0.16 seconds 2024-09-22 18:43:35,372 INFO [vm] Environment started in 42.24 seconds Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 7049474 - Browse repository at this point
Copy the full SHA 7049474View commit details -
Clean up probing for completion
- Use `/readyz` endpoint for waiting until kubernetes cluster is ready instead of `kubectl version`. This matches the way we wait for the cluster in drenv. - When waiting for coredns deployment, use `kubectl rollout status`, matching other code in drenv and addons. - Nicer probe description, visible in drenv when using --verbose. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for cd6cffb - Browse repository at this point
Copy the full SHA cd6cffbView commit details -
Simplify port forwarding rules
We can ignore all ports for all protocols for all addresses using a simpler rule. With this change we see this log in --verbose mode: 2024-09-23 01:12:18,130 DEBUG [cluster] [hostagent] TCP (except for SSH) and UDP port forwarding is disabled And no "Not forwarding port ..." message. Previously we had lot of these messages[1]. This requires lima commit[2]. pull current master if you run older version. [1] lima-vm/lima#2577 [2] lima-vm/lima@9a09350 Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 3692dc5 - Browse repository at this point
Copy the full SHA 3692dc5View commit details -
Group kubeconfig setup in same provision step
And remove unneeded export - it was used to run kubectl commands before we setup root/.kube/config, but this step does not run any kubectl commands. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for c10641d - Browse repository at this point
Copy the full SHA c10641dView commit details -
Provision scripts using `mode: system` run as root and do not need use sudo. Looks like the script were copied from code not running as root. Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for b911b9a - Browse repository at this point
Copy the full SHA b911b9aView commit details -
This is fixed now in lima[1] so we don't need to keep the fix in drenv. Developers should pull latest lima from git to use this fix. [1] lima-vm/lima#2632 Signed-off-by: Nir Soffer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for da3dc20 - Browse repository at this point
Copy the full SHA da3dc20View commit details -
Disallow relocation execution when a cluster is unreachable
Added a check to stop relocation reconciliation if one of the clusters is unreachable. This prevents potential misclassification after hub recovery, which could lead to undesired results and inconsistencies. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2304182 Signed-off-by: Benamar Mekhissi <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 030adfa - Browse repository at this point
Copy the full SHA 030adfaView commit details -
Fixes https://issues.redhat.com/browse/OCSBZM-4691
Signed-off-by: Annaraya Narasagond <[email protected]> Signed-off-by: Annaraya Narasagond <[email protected]> Annaraya Narasagond <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 79be579 - Browse repository at this point
Copy the full SHA 79be579View commit details -
Moving locks to util and some refactoring
Signed-off-by: Annaraya Narasagond <[email protected]> Signed-off-by: Annaraya Narasagond <[email protected]> Annaraya Narasagond <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 9611e95 - Browse repository at this point
Copy the full SHA 9611e95View commit details -
Adding changes to the locking mechanism
Signed-off-by: Annaraya Narasagond <[email protected]> Signed-off-by: Annaraya Narasagond <[email protected]> Annaraya Narasagond <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 87c5a1d - Browse repository at this point
Copy the full SHA 87c5a1dView commit details -
Signed-off-by: Annaraya Narasagond <[email protected]> Signed-off-by: Annaraya Narasagond <[email protected]> Annaraya Narasagond <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 62faf94 - Browse repository at this point
Copy the full SHA 62faf94View commit details -
Signed-off-by: Annaraya Narasagond <[email protected]> Signed-off-by: Annaraya Narasagond <[email protected]> Annaraya Narasagond <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for af60e2a - Browse repository at this point
Copy the full SHA af60e2aView commit details
Commits on Oct 17, 2024
-
Signed-off-by: Annaraya Narasagond <[email protected]> Signed-off-by: Annaraya Narasagond <[email protected]> Annaraya Narasagond <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 04f99b1 - Browse repository at this point
Copy the full SHA 04f99b1View commit details