Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e2e flake: timeout deleting volsync-test-file after successful test #1661

Open
Tracked by #1717
nirs opened this issue Nov 21, 2024 · 0 comments
Open
Tracked by #1717

e2e flake: timeout deleting volsync-test-file after successful test #1661

nirs opened this issue Nov 21, 2024 · 0 comments
Labels
bug Something isn't working high Issue is of high priority and needs attention test Testing related issue

Comments

@nirs
Copy link
Member

nirs commented Nov 21, 2024

We see this randomly in the CI, typically very rarely, but recently we see more and more failures.

When it happened locally, the issue was that volsync deleted the cephfs snapshot, but the snapshot was never deleted.

When this happens retrying the e2e job will continue to fail since we use the same namespace and resource names in the next run.

Fixing the cluster require manual cleanup. I think removing the finalizer on the volumesnapshot was enough, but I did not check if there are no leftovers on in cephfs.

Example error:

drenv.commands.Error: Command failed:
   command: ('addons/volsync/test', 'dr1', 'dr2')
   exitcode: 1
   error:
      Traceback (most recent call last):
        File "/home/github/actions-runner/_work/ramen/ramen/test/addons/volsync/test", line 259, in <module>
          t.result()
        File "/usr/lib64/python3.12/concurrent/futures/_base.py", line 449, in result
          return self.__get_result()
                 ^^^^^^^^^^^^^^^^^^^
        File "/usr/lib64/python3.12/concurrent/futures/_base.py", line 401, in __get_result
          raise self._exception
        File "/usr/lib64/python3.12/concurrent/futures/thread.py", line 58, in run
          result = self.fn(*self.args, **self.kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/home/github/actions-runner/_work/ramen/ramen/test/addons/volsync/test", line 243, in test
          teardown(cluster1, cluster2, variant)
        File "/home/github/actions-runner/_work/ramen/ramen/test/addons/volsync/test", line 223, in teardown
          kubectl.wait(
        File "/home/github/actions-runner/_work/ramen/ramen/test/drenv/kubectl.py", line [144](https://github.com/RamenDR/ramen/actions/runs/11955746683/job/33338240384?pr=1548#step:7:145), in wait
          _watch("wait", *args, context=context, log=log)
        File "/home/github/actions-runner/_work/ramen/ramen/test/drenv/kubectl.py", line 216, in _watch
          for line in commands.watch(*cmd, input=input):
        File "/home/github/actions-runner/_work/ramen/ramen/test/drenv/commands.py", line 207, in watch
          raise Error(args, error, exitcode=p.returncode)
      drenv.commands.Error: Command failed:
         command: ('kubectl', 'wait', '--context', 'dr2', 'ns', 'volsync-test-file', '--for=delete', '--timeout=120s')
         exitcode: 1
         error:
            error: timed out waiting for the condition on namespaces/volsync-test-file

Failed builds:

@nirs nirs added bug Something isn't working test Testing related issue high Issue is of high priority and needs attention labels Nov 21, 2024
@nirs nirs mentioned this issue Dec 11, 2024
19 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working high Issue is of high priority and needs attention test Testing related issue
Projects
None yet
Development

No branches or pull requests

1 participant