-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow Dask Client to be configured #86
Comments
Hi @AlecThomson! Thanks for posting this. What you're suggesting is indeed feasible. If I might correctly your suggestion slightly, one would pass a Some effort would need to investigate whether the dask distributed scheduler successfully handles these graphs. In particular, tricolour packs scan data into a The packing step can be done in two ways, depending on the
I would guess (2) would work better on the distributed scheduler, for instance. |
Just a note that the underpinning casacore-tables locking system is not
distribution safe when run on multi-host environments since it is using
sysctl locking underneath, so this may need somewhat of a rethink.
…On Tue, Jun 28, 2022 at 10:29 AM Simon Perkins ***@***.***> wrote:
Hi @AlecThomson <https://github.com/AlecThomson>! Thanks for posting this.
What you're suggesting is indeed feasible. If I might correctly your
suggestion slightly, one would pass a scheduler address as a tricolour
argument, allowing the tricolour graph to be submitted to a distributed
dask scheduler for execution on dask worker nodes.
Some effort would need to investigate whether the dask distributed
scheduler successfully handles these graphs.
See dask/distributed#6360
<dask/distributed#6360> that discuss some the
existing issues with the distributed scheduler.
In particular, tricolour packs scan data into a (baseline, time, chan,
corr) chunk which is then rechunked per baseline so that flagging is
parallelised over baseline.
The packing step can be done in two ways, depending on the
--window-backend option:
1. in memory, which results in gathering all scan chunks into a single
window chunk which is then rechunk.
2. on dask using zarr, which might be a bit slower, but IIRC has an
embarrassingly parallel independent graph.
I would guess (2) would work better on the distributed scheduler, for
instance.
—
Reply to this email directly, view it on GitHub
<#86 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB4RE6SG62J4V333H6AEXZDVRKZUZANCNFSM52BGKIDA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
--
--
Benjamin Hugo
PhD. student,
Centre for Radio Astronomy Techniques and Technologies
Department of Physics and Electronics
Rhodes University
Junior software developer
Radio Astronomy Research Group
South African Radio Astronomy Observatory
Black River Business Park
Observatory
Cape Town
|
Hi there,
I'm wondering if it would be possible to add an optional configuration within
tricolour
to allow either a DaskClient
object to passed in, or an address to a externally runningdask-client
from the command-line. This would, if specified, bypass the currently createdThreadPool
object. The upside of this would allow all DaskClient
types, such asdask-mpi
anddask-jobqueue
which can run jobs over multiple nodes.Apologies for not opening a PR or similar, but I'm not fully familiar with all the workigs of this project, nor
contextlib
specifically.A simple change to
app.py
's main could just be:Sorry if this had already been considered and subsequently ruled out!
The text was updated successfully, but these errors were encountered: