create new workflow to use Dask task execution #3

tiborsimko · 2024-10-09T14:05:46Z

Currently, this example uses Snakemake to parallelise jobs in a multi-cascading style (i.e. processing files indepedently for each sample, then merging each sample, then merging samples together).

It would be good to test the same demo example running on Dask (USE_DASK = True in the notebook.)

This could be done in two ways, (i) in addition to Snakemake, (ii) instead of Snakemake,.

For the former, the inputs.yaml file already contains a pre-prepared use_dask input parameter. However, it does not seem advantageous to use both Snakemake and Dask parallelisation to boot, since the multi-cascading nature of Snakefile would probably have to change considerably in order not to scatter "twice" via Snakemake, but rather let Dask to do some scattering. We can get to this in the future.

For now, let's try to do the latter, i.e. use Dask only for all the DAG job multi-cascading. We can do this by creating a new reana-dask.yaml workflow specification file that could even use the Serial workflow engine, and call the notebook with USE_DASK set to True, and llet Dask to do all the parallelisations to arrive at results.

We could then start comparing Snakemake-based parallelisation vs Dask-basked parallelisation and see how well they perform.

The text was updated successfully, but these errors were encountered:

tiborsimko added this to Dask Oct 9, 2024

tiborsimko moved this to Ready for work in Dask Oct 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

create new workflow to use Dask task execution #3

create new workflow to use Dask task execution #3

tiborsimko commented Oct 9, 2024

create new workflow to use Dask task execution #3

create new workflow to use Dask task execution #3

Comments

tiborsimko commented Oct 9, 2024