-
Notifications
You must be signed in to change notification settings - Fork 0
1) Input data and Preprocessing
Bioinformatic suite for Nanopore cfDNA analysis
The pipeline is optimized to work with fastq obtained from Nanopore multiplex DNA runs (NBD114.24 or NBD114.96), basecalled with dorado; but it's adaptable to older chemistries or basecallers. If you plan to perform methylation analysis you'll need aligned bams (generated using "Alignment ON" during a MinKNOW run, or specify the --reference flag if running dorado basecaller) please use GCF_000001405.39_GRCh38.p13 as reference genome (you can download it here https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000001405.39/ ). The adapters/barcodes must be intact (don't select "barcode trimming" during MinKNOW, or specify --no-trim running dorado basecaller)
You can use Dorado as a standalone basecaller, or during a MinKNOW run to produce data in realtime.
Dorado Standalone:
- Single bam files: All the reads are stored in a single bam file. these files are usually pretty large (especially for promethion experiments) and the demultiplexing handles them badly. Split the bam file in multiple .fastq files, and store them in a folder named "fastq"
mkdir fastq
cd fastq
samtools fastq /path/to/modcalls.bam | split -l 4000 --additional-suffix=.fastq
where "/path/to/modcalls.bam" is the path in which is stored the .bam file produced by dorado.
- Splitted bam files (typically if the basecalling is performed during the run via MinKNOW):
can be either:
- Demultiplexed folders (bam_pass/barcode*/*bam) if "Barcoding" was set ON during the run.
mkdir fastq for i in /path/to/bam_pass/*/*bam ; do samtools fastq $i > fastq/$(basename $i).fastq
- Mixed bam files (bam_pass/*bam) if "Barcoding" was set OFF during the run
where "/path/to/bam_pass/" is the folder in which MinKNOW is saving alignments.mkdir fastq for i in /path/to/bam_pass/*bam ; do samtools fastq $i > fastq/$(basename $i).fastq
run guppy_barcoder on the fastq folder you have just created
guppy_barcoder --device "cuda:0" -t 20 -i fastq -s demux --barcode_kits SQK-NBD114-96 --compress_fastq --enable_trim_barcodes
where "demux" is the output folder containing demultiplexed files.
You can also use the cpu version of guppy_barcoder, but we suggest the use of a GPU for large amounts of reads.
Once you have obtained demultiplexed .fastq files, align them using use GCF_000001405.39_GRCh38.p13 as reference genome (you can download it here https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000001405.39/ ). We suggest using Minimap for alignment as the pipeline is optimised for it, other aligners might work but it has not been estensively tested with them.
minimap2 -ax map-ont --MD -L GCF_000001405.39_GRCh38.p13_genomic.mmi demultiplexed_sample.fastq | samtools view -h -q 20 -F 0x4 -F 0x100 -F 0x800 -O BAM -o demultiplexed_sample.bam