You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My impression is, that currently, the bulk service processes requests in the order they come in. The number of threads (concurrently active requests) is limited, so it can happen that a few stage requests are keeping all threads busy, and release (=unpin) requests are not getting through because they remain queued.
So, you might have a situation where, for instance, Atlas is staging a lot, and the stage pools fills up, and the unpin requests are waiting for the stage operations to complete, so they are not getting through, although they would be the solution for the space congestion.
I've had to do some tuning to make this work better:
# Number of concurrent active requests.
# Default: 100
# If stage pools fill up, increasing this may help
# to let "release" requests through.
bulk.limits.container-processing-threads=1000
With 1000 threads, I saw that unpin requests could get through so they could free up space so that staging could continue.
So I'd like to suggest that release requests have priority over stage requests. Release requests take very little time and they free up the necessary space for staging to succeed. If that is too difficult or undesired, it might help a lot to increase the default number of threads. And perhaps while we're at it, bulk.limits.max-requests-per-user could have a bigger default too (like 50000 instead of 5000). Atlas would be very happy.
An alternative idea might be to reserve some threads for only release requests, so they never stall when stage requests are stuck.
Cheers,
Onno
The text was updated successfully, but these errors were encountered:
onnozweers
changed the title
Request: bulk: give priority to release requests over stage requests
Request: bulk: give priority to release requests over stage requests
Aug 6, 2024
# ---- Algorithm for promoting requests to active.
#
# Implementations:
#
# - org.dcache.services.bulk.manager.scheduler.LeastRecentFirstScheduler:
# selects the least recent requests to run in order of arrival
# (first-come first-served).
#
bulk.request-scheduler=org.dcache.services.bulk.manager.scheduler.LeastRecentFirstScheduler
Dear dCache devs,
My impression is, that currently, the bulk service processes requests in the order they come in. The number of threads (concurrently active requests) is limited, so it can happen that a few stage requests are keeping all threads busy, and release (=unpin) requests are not getting through because they remain queued.
So, you might have a situation where, for instance, Atlas is staging a lot, and the stage pools fills up, and the unpin requests are waiting for the stage operations to complete, so they are not getting through, although they would be the solution for the space congestion.
I've had to do some tuning to make this work better:
With 1000 threads, I saw that unpin requests could get through so they could free up space so that staging could continue.
So I'd like to suggest that release requests have priority over stage requests. Release requests take very little time and they free up the necessary space for staging to succeed. If that is too difficult or undesired, it might help a lot to increase the default number of threads. And perhaps while we're at it,
bulk.limits.max-requests-per-user
could have a bigger default too (like 50000 instead of 5000). Atlas would be very happy.An alternative idea might be to reserve some threads for only
release
requests, so they never stall when stage requests are stuck.Cheers,
Onno
The text was updated successfully, but these errors were encountered: