-
Notifications
You must be signed in to change notification settings - Fork 0
03.Data analysis
Jinyang Zhang edited this page Jul 18, 2023
·
1 revision
# e.g.
sudo su - minknow
/opt/ont/guppy/bin/guppy_basecall_client \
-r --input_path path_to_input --save_path path_to_output \
-c dna_r9.4.1_450bps_hac.cfg \
--port ///tmp/.guppy/5555 \
--barcode_kits "EXP-NBD114" \
--compress_fastq
/opt/ont/guppy/bin/guppy_barcoder \
-i path_to_output/pass \
-s path_to_output/barcoded \
--compress_fastq \
--disable_pings
For instance, if you want to demultiplex reads from channel 1 to 256
python3 scripts/step1_demultiplex.py -i input.fastq.gz -o sample1.fastq.gz --start 1 --end 256
Usage: step1_demultiplex.py [OPTIONS]
Options:
-i, --infile TEXT input gzipped fastq file. [required]
-o, --outfile TEXT output gzipped fastq file. [required]
-st, --start INTEGER start channel number. [required]
-en, --end INTEGER end channel number. [required]
-t, --threads INTEGER number of threads.
Trim nanopore sequencing adapters using Porechop
porechop -i sample1.fastq.gz -o sample1.trimmed.fastq.gz --threads 32 --check_reads 1000
Generate full-length consensus reads and non full-length reads
python3 scripts/step2_consensus.py \
-i sample1.trimmed.fastq.gz \
-s sequencing_summary_FAQ85160_399ee876.txt \
-o ./output_sample1 \
-p sample1 \
-t 16 \
--trimA
Usage: step2_consensus.py [OPTIONS]
Options:
-i, --input PATH input trimmed fastq. [required]
-s, --summary PATH input sequencing summary generate by MinKNOW. [required]
-o, --outdir PATH output directory. [required]
-p, --prefix TEXT output prefix name. [required]
-r, --adapter PATH Adapter sequences file. Defaults to embedded splint adapter sequences.
-t, --threads INTEGER number of threads. Defaults to number of cpu cores.
--trimA trim 3' poly(A) sequences
- The output
sample1.fl.fa
is the full-length consensus reads with both 5' and 3' primers. - The output
sample1.recovered.fa
is the partial fragents that only have one 5'/3' primer.
Use full-length reads for full-length isoform assembly and quantification
python step3_analysis.py \
-i ./output_sample1 \
-p sample1 \
-r GRCh38.primary_assembly.genome.fa \
-a gencode.v37.annotation.gtf \
-t 16 \
--assemble \
--bed ../cancer_panel.bed
Usage: step3_analysis.py [OPTIONS]
Options:
-i, --workspace PATH directory of step2_consensus.py output [required]
-p, --prefix TEXT sample prefix for step2_consensus.py [required]
-r, --genome PATH reference genome fasta. [required]
-a, --gtf PATH gene annotation gtf. [required]
-b, --bed PATH bed file for target regions. [required]
-t, --threads INTEGER number of threads. Defaults to number of cpu cores.
--assemble perform transcript isoform assemble.
--help Show this message and exit.
- The output
sample1_isoforms.gtf
is the assembled and annotated full-length transcript isoforms in GTF format (requires--assemble
). - The output
sample1_isoforms.transcripts.sf
is transcript-level quantification results in the Salmon tsv format. The output contains five columns: transcript name, transcript length, effective length, CPM, number of reads. - The output
sample1_isoforms.genes.sf
is gene-level quantification results. The output contains three columns: gene name, CPM, number of reads.