Skip to content

1) Input data and Preprocessing

Filo_Puputnik edited this page Jul 4, 2024 · 1 revision

filotools

Bioinformatic suite for Nanopore cfDNA analysis

The pipeline is optimized to work with fastq obtained from Nanopore multiplex DNA runs (NBD114.24 or NBD114.96), basecalled with dorado; but it's adaptable to older chemistries or basecallers. If you plan to perform methylation analysis you'll need aligned bams (generated using "Alignment ON" during a MinKNOW run, or specify the --reference flag if running dorado basecaller) please use GCF_000001405.39_GRCh38.p13 as reference genome (you can download it here https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000001405.39/ ). The adapters/barcodes must be intact (don't select "barcode trimming" during MinKNOW, or specify --no-trim running dorado basecaller)

Basecalling

You can use Dorado as a standalone basecaller, or during a MinKNOW run to produce data in realtime.

Dorado Standalone:

  • Single bam files: All the reads are stored in a single bam file. these files are usually pretty large (especially for promethion experiments) and the demultiplexing handles them badly. Split the bam file in multiple .fastq files, and store them in a folder named "fastq"
  mkdir fastq
  cd fastq
  samtools fastq /path/to/modcalls.bam | split -l 4000 --additional-suffix=.fastq

where "/path/to/modcalls.bam" is the path in which is stored the .bam file produced by dorado.

  • Splitted bam files (typically if the basecalling is performed during the run via MinKNOW): can be either:
    • Demultiplexed folders (bam_pass/barcode*/*bam) if "Barcoding" was set ON during the run.
       mkdir fastq
       for i in /path/to/bam_pass/*/*bam ; do samtools fastq $i > fastq/$(basename $i).fastq
    
    • Mixed bam files (bam_pass/*bam) if "Barcoding" was set OFF during the run
       mkdir fastq
       for i in /path/to/bam_pass/*bam ; do samtools fastq $i > fastq/$(basename $i).fastq
    
    where "/path/to/bam_pass/" is the folder in which MinKNOW is saving alignments.

Demultiplexing and barcode trimming

run guppy_barcoder on the fastq folder you have just created

guppy_barcoder --device "cuda:0"  -t 20 -i fastq -s demux --barcode_kits  SQK-NBD114-96  --compress_fastq  --enable_trim_barcodes

where "demux" is the output folder containing demultiplexed files.

You can also use the cpu version of guppy_barcoder, but we suggest the use of a GPU for large amounts of reads.

Alignment

Once you have obtained demultiplexed .fastq files, align them using use GCF_000001405.39_GRCh38.p13 as reference genome (you can download it here https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000001405.39/ ). We suggest using Minimap for alignment as the pipeline is optimised for it, other aligners might work but it has not been estensively tested with them.

minimap2 -ax map-ont --MD -L GCF_000001405.39_GRCh38.p13_genomic.mmi demultiplexed_sample.fastq | samtools view -h -q 20 -F 0x4 -F 0x100 -F 0x800 -O BAM -o demultiplexed_sample.bam
Clone this wiki locally