Given a directory with results from a DRAGEN/UMCCR workflow, {dracarys} will grab files of interest and transform them into ‘tidier’ structures for output into TSV/Parquet/RDS format for downstream ingestion into a database/data lake. See supported workflows, running examples, and CLI options in the sections below.
R
remotes::install_github("umccr/[email protected]") # for vX.X.X Release/Tag
Conda
- Linux & MacOS (non-M1)
mamba create \
-n dracarys_env \
-c umccr -c bioconda -c conda-forge \
r-dracarys==X.X.X
conda activate dracarys_env
- MacOS M1
CONDA_SUBDIR=osx-64 \
mamba create \
-n dracarys_env \
-c umccr -c bioconda -c conda-forge \
r-dracarys==X.X.X
conda activate dracarys_env
Docker
docker pull --platform linux/amd64 ghcr.io/umccr/dracarys:X.X.X
{dracarys} supports most outputs from the following DRAGEN/UMCCR workflows:
Workflow | Description |
---|---|
bcl_convert | BCLConvert workflow |
tso_ctdna_tumor_only | ctDNA TSO500 workflow |
wgs_alignment_qc | DRAGEN DNA (alignment) workflow |
wts_alignment_qc | DRAGEN RNA (alignment) workflow |
wts_tumor_only | DRAGEN RNA workflow |
wgs_tumor_normal | DRAGEN Tumor/Normal workflow |
umccrise | umccrise workflow |
rnasum | RNAsum workflow |
sash | sash workflow |
oncoanalyser | oncoanalyser workflow |
See which output files from these workflows are supported in Supported Files.
A dracarys.R
command line interface is available for convenience.
- If you’re using the conda package, the
dracarys.R
command will already be available inside the activated conda environment. - If you’re not using the conda package, you need to export the
dracarys/inst/cli/
directory to yourPATH
in order to usedracarys.R
.
dracarys_cli=$(Rscript -e 'x = system.file("cli", package = "dracarys"); cat(x, "\n")' | xargs)
export PATH="${dracarys_cli}:${PATH}"
dracarys.R --version
dracarys.R 0.16.0
#-----------------------------------#
dracarys.R --help
usage: dracarys.R [-h] [-v] {tidy} ...
🐉 DRAGEN Output Post-Processing 🔥
positional arguments:
{tidy} sub-command help
tidy Tidy UMCCR Workflow Outputs
options:
-h, --help show this help message and exit
-v, --version show program's version number and exit
#-----------------------------------#
#------- Tidy ----------------------#
dracarys.R tidy --help
usage: dracarys.R tidy [-h] -i IN_DIR -o OUT_DIR -p PREFIX [-t TOKEN]
[-l LOCAL_DIR] [-f FORMAT] [-n] [-q]
options:
-h, --help show this help message and exit
-i IN_DIR, --in_dir IN_DIR
⛄️ Directory with untidy UMCCR workflow results. Can
be GDS, S3 or local.
-o OUT_DIR, --out_dir OUT_DIR
🔥 Directory to output tidy results.
-p PREFIX, --prefix PREFIX
🎻 Prefix string used for all results.
-t TOKEN, --token TOKEN
🙈 ICA access token. Default: ICA_ACCESS_TOKEN env var.
-l LOCAL_DIR, --local_dir LOCAL_DIR
📥 If input is a GDS/S3 directory, download the
recognisable files to this directory. Default:
'<out_dir>/dracarys_<gds|s3>_sync'.
-f FORMAT, --format FORMAT
🎨 Format of output. Default: tsv.
-n, --dryrun 🐫 Dry run - just show files to be tidied.
-q, --quiet 😴 Shush all the logs.
{dracarys} takes as input (--in_dir
) a directory with results from one
of the UMCCR workflows. It will recursively scan
that directory for supported
files, download
those into a local directory (--gds_local_dir
), and then it will
parse, transform and write the tidied versions into the specified output
directory (--out_dir
). A prefix (--prefix
) is prepended to each of
the tidied files. The output file format (--format
) can be tsv,
parquet, or both. To get just a list of supported files within the
specified input directory, use the -n (--dryrun)
option.
R
# help(umccr_tidy)
in_dir <- "gds://path/to/subjectX_multiqc_data/"
out_dir <- tempdir()
prefix <- "subjectX"
umccr_tidy(in_dir = in_dir, out_dir = out_dir, prefix = prefix)
Mac/Linux
From within an activated conda environment or a shell with the
dracarys.R
CLI available:
dracarys.R tidy \
-i gds://path/to/subjectX_multiqc_data/ \
-o local_output_dir \
-p subjectX_prefix
Docker
docker container run \
-v $(PWD):/mount1 \
--platform=linux/amd64 \
--env "ICA_ACCESS_TOKEN" \
--rm -it \
ghcr.io/umccr/dracarys:X.X.X \
dracarys.R tidy \
-i gds://path/to/subjectX_multiqc_data/ \
-o /mount1/output_dir \
-p subjectX_prefix