Nextflow pipeline for analysis of libraries prepared using the ImmunoPETE assay.
Note... The Nextflow Config file must be configured for the queue.
- Python 3.6
- Java 8
- Nextflow 19.07.0, to run the pipeline
- UGE, for cluster job submission
- bats 0.4.0, for testing
git clone [email protected]:bioinform/Daedalus.git
cd Daedalus
git checkout tags/${release-version}
It's recommended to create a conda environment:
conda create -n Daedalus python=3.6
conda activate Daedalus
Within Daedalus directory, execute the following command.
pip install .
Due to license restriction, you will have to build the Bcl2fastq image using the Docker file. Please refer to Dockerhub for creating repo and pushing images.
docker build -t {dockerhub_username}/bcl2fastq:{version} -f Dockerfile_bcl2fastq .
docker push {dockerhub_username}/bcl2fastq:{version}
After building your own images, set the following params in the nextflow/defaults-ipete.config
with your own images.
params.bcl2fastq_docker = "{dockerhub_username}/bcl2fastq:{version}"
The pipeline runs on Roche SC1 computing cluster (UGE) by default. If you install it on a different machine,
modify the cluster settings in the nextflow/nextflow.config
accordingly.
Please refer to Nextflow's documentation for more details:
SGE/UGE,
Docker.
ipete_docker {
process.clusterOptions = { "-l h_vmem=${task.ext.vmem} -S /bin/bash -l docker_version=new -V" }
}
docker.runOptions = "-u=\$UID --rm -v /path/to/input_and_output:/path/to/input_and_output -v /path/to/daedalus_repo:/path/to/daedalus_repo"
Once all the software has been installed and nextflow has been configured the pipeline bats test can be run. The bats test runs the pipeline on a single sample, from the paired fastq files provided:
data/PBMC_1000ng_25ul_2_S6_R1_001.fastq.gz
data/PBMC_1000ng_25ul_2_S6_R2_001.fastq.gz
Run the test using the following commands:
cd test
bats single-sample-ipete.bats
Running the pipeline requires a complete flowcell worth of immunoPETE libraries.
manifestGenerator = /path/to/Daedalus/pipeline_runner/manifest_generator.py
illuminaDir = /path/to/illumina/run_folder
sampleSheet = /path/to/sampleSheet.csv
python ${manifestGenerator} \
--pipeline_run_id Daedalus_example_run \
--sequencing_run_folder ${illuminaDir} \
--output Daedalus_example_manifest.csv \
--subsample 1 \
--umi_mode True \
--umi2 'NNNNNNNNN' \
--umi_type R2 \
${sampleSheet}
The manifest file contains all parameters needed for the pipeline to run. Sample specific tuning of parameters or any updates to the parameters can be achieved by editing the manifest file generated. After edits are complete, the pipeline can be submitted using the manifest file alone.
Using the output from Manifest Generator Daedalus_example_manifest.csv
pipeline runs can be submitted using the script: pipeline_runner.py.
pipelineRunner=/path/to/Daedalus/pipeline_runner/pipeline_runner.py
outDir=/path/to/analysis/output
python ${pipelineRunner} --no_fairshare --wait --resume -o ${outDir} Daedalus_example_manifest.csv
At the specified output directory ${outDir}
, the analysis folder will be written using the pipeline_run_id
"Daedalus_example_run"
${outDir}/Daedalus_example_run
Overview of the Pipeline Methods for key processing steps.