e2e flake: timeout deleting volsync-test-file after successful test #1661

nirs · 2024-11-21T18:16:42Z

We see this randomly in the CI, typically very rarely, but recently we see more and more failures.

When it happened locally, the issue was that volsync deleted the cephfs snapshot, but the snapshot was never deleted.

When this happens retrying the e2e job will continue to fail since we use the same namespace and resource names in the next run.

Fixing the cluster require manual cleanup. I think removing the finalizer on the volumesnapshot was enough, but I did not check if there are no leftovers on in cephfs.

Example error:

drenv.commands.Error: Command failed:
   command: ('addons/volsync/test', 'dr1', 'dr2')
   exitcode: 1
   error:
      Traceback (most recent call last):
        File "/home/github/actions-runner/_work/ramen/ramen/test/addons/volsync/test", line 259, in <module>
          t.result()
        File "/usr/lib64/python3.12/concurrent/futures/_base.py", line 449, in result
          return self.__get_result()
                 ^^^^^^^^^^^^^^^^^^^
        File "/usr/lib64/python3.12/concurrent/futures/_base.py", line 401, in __get_result
          raise self._exception
        File "/usr/lib64/python3.12/concurrent/futures/thread.py", line 58, in run
          result = self.fn(*self.args, **self.kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/home/github/actions-runner/_work/ramen/ramen/test/addons/volsync/test", line 243, in test
          teardown(cluster1, cluster2, variant)
        File "/home/github/actions-runner/_work/ramen/ramen/test/addons/volsync/test", line 223, in teardown
          kubectl.wait(
        File "/home/github/actions-runner/_work/ramen/ramen/test/drenv/kubectl.py", line [144](https://github.com/RamenDR/ramen/actions/runs/11955746683/job/33338240384?pr=1548#step:7:145), in wait
          _watch("wait", *args, context=context, log=log)
        File "/home/github/actions-runner/_work/ramen/ramen/test/drenv/kubectl.py", line 216, in _watch
          for line in commands.watch(*cmd, input=input):
        File "/home/github/actions-runner/_work/ramen/ramen/test/drenv/commands.py", line 207, in watch
          raise Error(args, error, exitcode=p.returncode)
      drenv.commands.Error: Command failed:
         command: ('kubectl', 'wait', '--context', 'dr2', 'ns', 'volsync-test-file', '--for=delete', '--timeout=120s')
         exitcode: 1
         error:
            error: timed out waiting for the condition on namespaces/volsync-test-file

Failed builds:

The text was updated successfully, but these errors were encountered:

nirs added bug Something isn't working test Testing related issue high Issue is of high priority and needs attention labels Nov 21, 2024

nirs mentioned this issue Dec 11, 2024

Tracker: e2e #1717

Open

19 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

e2e flake: timeout deleting volsync-test-file after successful test #1661

e2e flake: timeout deleting volsync-test-file after successful test #1661

nirs commented Nov 21, 2024 •

edited

Loading

e2e flake: timeout deleting volsync-test-file after successful test #1661

e2e flake: timeout deleting volsync-test-file after successful test #1661

Comments

nirs commented Nov 21, 2024 • edited Loading

nirs commented Nov 21, 2024 •

edited

Loading