Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pool Managers-Pools: Duplicate Restores on Multiple Pools Following Unexpected Pool Restart #7653

Open
vingar opened this issue Sep 5, 2024 · 0 comments
Assignees

Comments

@vingar
Copy link
Collaborator

vingar commented Sep 5, 2024

Hello,

With unexpected pool restarts described in #7652, we have observed that ongoing restores on a restarted pool are being rescheduled in duplicate across multiple pools. We have three pool managers in the USATLAS settings(dCache: 9.2.17). The duplicate and concurrent restores for the same file than appears only as a single restore on the pool managers.

Example:

The restore for 0000269A5C57C40241629D7EFBC3654FB04A started on pool dc269_12 which was automatically restarted due to a memory error. The timestamps in billing are:

Start time: 2024-08-31 21:12:31
End time: 2024-08-31 21:28:49

The restore was then rescheduled concurrently on three pools: dc263_12, dc267_12 and dc278_12 (still ongoing)

The timestamps in billing are:
Start time on dc263_12: 2024-08-31 21:21:10
Start time on dc267_12: 2024-09-03 14:11:33

and dc278_12 which is only associated with the restore on the pool managers:

0000269A5C57C40241629D7EFBC3654FB04A@internal-net-external-net-world-net-*/* m=15 r=0 [dc278_12] [Waiting for stage: dc278_12 09.03 14:11:39] {0,}

Thank you in advance for any help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants