You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, this example uses Snakemake to parallelise jobs in a multi-cascading style (i.e. processing files indepedently for each sample, then merging each sample, then merging samples together).
It would be good to test the same demo example running on Dask (USE_DASK = True in the notebook.)
This could be done in two ways, (i) in addition to Snakemake, (ii) instead of Snakemake,.
For the former, the inputs.yaml file already contains a pre-prepared use_dask input parameter. However, it does not seem advantageous to use both Snakemake and Dask parallelisation to boot, since the multi-cascading nature of Snakefile would probably have to change considerably in order not to scatter "twice" via Snakemake, but rather let Dask to do some scattering. We can get to this in the future.
For now, let's try to do the latter, i.e. use Dask only for all the DAG job multi-cascading. We can do this by creating a new reana-dask.yaml workflow specification file that could even use the Serial workflow engine, and call the notebook with USE_DASK set to True, and llet Dask to do all the parallelisations to arrive at results.
We could then start comparing Snakemake-based parallelisation vs Dask-basked parallelisation and see how well they perform.
The text was updated successfully, but these errors were encountered:
Currently, this example uses Snakemake to parallelise jobs in a multi-cascading style (i.e. processing files indepedently for each sample, then merging each sample, then merging samples together).
It would be good to test the same demo example running on Dask (
USE_DASK = True
in the notebook.)This could be done in two ways, (i) in addition to Snakemake, (ii) instead of Snakemake,.
For the former, the
inputs.yaml
file already contains a pre-prepareduse_dask
input parameter. However, it does not seem advantageous to use both Snakemake and Dask parallelisation to boot, since the multi-cascading nature of Snakefile would probably have to change considerably in order not to scatter "twice" via Snakemake, but rather let Dask to do some scattering. We can get to this in the future.For now, let's try to do the latter, i.e. use Dask only for all the DAG job multi-cascading. We can do this by creating a new
reana-dask.yaml
workflow specification file that could even use the Serial workflow engine, and call the notebook withUSE_DASK
set to True, and llet Dask to do all the parallelisations to arrive at results.We could then start comparing Snakemake-based parallelisation vs Dask-basked parallelisation and see how well they perform.
The text was updated successfully, but these errors were encountered: