-
Notifications
You must be signed in to change notification settings - Fork 0
3) Commands
Filo_Puputnik edited this page Jul 4, 2024
·
6 revisions
prints filotools available commands
WELCOME TO FILOTOOLS!
command list:
stats creates .stats file from filtered sam/bam files doc:YES config:NO
readl calculates read counts for each readlength bin (1bp) from aligments (bam/stat files) or .fastq files
motif extracts 5' 4-mer motifs
countmotif calculates 4-mer motif counts for each of the 256 motifs
dutyplot recreate MinKNOW-like duty time plots from pore_activity_*.csv files
mergefastq Merge fastq files, creating a log with a list of the files used for merging
bothbar Create .ids files based on single/double barcoding status
mod_dem demultiplex mod_mappings.bam files based on bam/stats files, or read_id files. generate per_read modification files
ichor Automatization of ichorCNA pipeline
subsetter Subset bams based on an ordered list of reads
creates .stats file from filtered sam/bam files
usage: filotools stats [options] <input_bams> [other options]
can also glob, example:
filotool stats *.bam
options:
-o: specify output folder [default = input_path/STATS]
config file is ignored
-O: force output folder as stats_path from config file (for bam/stats)
calculates read counts for each readlength bin (1bp) from aligments (.stat files) or .fastq files
usage: filotools readl [options] <input_file(s)>
can also glob, example:
filotool readl *.bam
options:
-o: specify output folder [default = fastq_path/READL_RAW for fastq/fastq.gz input files, stats_path/READLENGTH_COUNTS for bam/stats input files]
-t: [INT] number of threads for multiprocessing (default: 4)
-s: [CHAR] folder that contains .idf files for subsetting
filtering options (only for bam/stats)
-l: [CHAR] specify chr_list files (default: config)
-q: [INT] specify minimum mapping quality (default: 20)
-H: don't filter out hard clipped reads
-S: don't filter out soft clipped reads
Config file is used by default for:
chr_list
-O: force output folder as readl_path (for bam/stats) and raw_readl_path (for fastq/fastq.gz)
-I: force input folder as stats_path (for bam/stats) and fastq_path (for fastq/fastq.gz)
-C: combination of -O and -I
extracts 5' 4-mer motifs
usage: filotools motif [options] <input_file(s)>
can also glob, example:
filotool motif *.bam
options:
-r: [CHAR] specify reference genome
-o: specify output folder [ default = <stats_input_folder>/MOTIF ]
-t: [INT] number of threads for multiprocessing (default: 4)
Config file is used by default for:
reference
-O: force output folder as motif_path
-I: force input folder as stats_path
-C: combination of -O and -I
generate raw counts for all the 256 4-mer 5' motifs
usage: filotools countmotif [options] <input_file(s)>
can also glob, example:
filotools countmotif *.bam
TYPICAL COMMAND LINE for double barcoded reads: filotools readl -t 30 -s BOTH_BARCODES *.stats
options:
-o: [CHAR] specify output folder [default = <stats_input_folder>/MOTIF_COUNTS]
-m: [CHAR] specify input motif folder [default = <stats_input_folder>/MOTIF]
-t: [INT] number of threads for multiprocessing (default: 4)
-s: [CHAR] folder containing ids files for subsetting
filtering options (only for bam/stats)
-l: [CHAR] specify path for chr_list files (default: taken from config)
-q: [INT] specify minimum mapping quality (default: 20)
-k: [INT] specify max readlength (default: 700)
-H: don't filter out hard clipped reads
-S: don't filter out soft clipped reads
Config file is used by default for:
chr_list
-O: force output folder as motif_count_path from config file
-I: force input folder as motif_path from config file
-C: combination of -O and -I
recreate MinKNOW-like duty time plots from pore_activity_*.csv files
usage:
filotools dutyplot -i <input_pore_activity.csv> [options]
options:
-o: [CHAR] specify path for output file [default: input_pore_activity.pdf]
-b: [INT] specify number of runtime bins [default: 30]
-f: [INT] specify a fixed lenght (in minutes) for each runtime bin (overrides -b) [default: null]
-m: [INT] specify the start time of the plot (in minutes) [default: 0]
-M: [INT] specify the end time of the plot (in minutes) [default: total runtime]
-L: suppress legend
Merge fastq files, creating a log with a list of the files used for merging
IMPORTANT: all file/library names must follow standard filotools naming layout
usage: merge_fastq -m <output name> <space separated list of inputs>, or glob expression, i.e. PLB03-T0-*.fastq.gz
logs are stored in <output_folder>/logs_merge
flags:
-f: force overwrite of previous merged files with the same name in -m
-v: verbose mode
-u: output .fastq files instead of .fastq.gz
-o: specify output folder (otherwise default: <input_file_dir>/MERGED)
Usage: creates .ids files with read ids (first column) and info about barcoding status (second column): 0=single barcoded, 1=double barcoded.
if using samplesheet
filotools bothbar -i <stats_path> -s <samplesheet_path> -o <output_dir>
if not using samplesheet
IMPORTANT: all file/library names must follow standard filotools naming layout. Launch this tool from within the folder containing .bam or .stats files
filotools bothbar -p '*.stats' -F <fastq_path> -M <merged_fastq_path> -B <barcoding_path> -o <output_dir>
filotools bothbar -p '*.stats' -C -o <output_dir>
Options:
-o OUTPUT_DIR, --output_dir=OUTPUT_DIR
path to output folder (mandatory)
-f FILTER_TRESHOLDS, --filter_tresholds=FILTER_TRESHOLDS
front and rear threshold for barcode detection, fomatted as <front_thr>:<rear_thr> [default: 60:60]
-s SAMPLESHEET, --samplesheet=SAMPLESHEET
path to tab separated file containing <sample_id_without_extension> <BXX> <path_to_barcoding_summary.txt> where XX is the two digits number of the barcode. if a sample is a result of a merging of samples from different runs, provide each barcode/barcoding summary pair in a separate line referencing to the same sample_id
-i STATS_PATH, --stats_path=STATS_PATH
path to folder containing .stats files (mandatory if using samplesheet, can be inherited from config_file using -C)
-p PATTERN, --pattern=PATTERN
pattern to retrieve samples from working directory. IMPORTANT: launch this tool from within the folder containing .bam or .stats files (used only if samplesheet is not provided). [default= '*.stats']
-F FASTQ_PATH, --fastq_path=FASTQ_PATH
path in which fastq files are stored (used only if samplesheet is not provided)
-M MERGED_FASTQ_PATH, --merged_fastq_path=MERGED_FASTQ_PATH
path in which merged fastq files are stored (used only if samplesheet is not provided)
-B BARCODING_PATH, --barcoding_path=BARCODING_PATH
path in which barcoded results are stored. this path should include one subfolder for each run with filo's naming layout LXX_FATXXXXX_countX. (used only if samplesheet is not provided)
-S BARCODING_SUFFIX, --barcoding_suffix=BARCODING_SUFFIX
path from barcoding_path subfolders to barcoding_summary.txt file [default: GUPPY_DEM_EITHER/barcoding_summary.txt]. (used only if samplesheet is not provided)
-C, --force_config
forces the use of fastq_path/barcoding_path/merged_fastq_path (only if samplesheet is not provided) and stats_path (always) from config file
-h, --help
Show this help message and exit
USAGE:
standard mode (all stats files must belong to the same run and modbam file)
filotools mod_dem -m 5mC -m 5hmC -b <path/to/modbam.bam> -o <path/to/store/demultiplexed_output_modbam/> -u <path/to/store/per_read_output/> *.stats
auto mode (can be launched on stats files from multiple runs, as far as everything is named according to filotools standard naming layout)
filotools mod_dem -m 5mC -m 5hmC -a -o <path/to/store/demultiplexed_output_modbam/> -u <path/to/store/per_read_output/> *.stats
auto mode with config
filotools mod_dem -m 5mC -m 5hmC -a -C *.stats
can work with 3 kind of inputs (inferred from input extension)
1) .sam/.bam
2) .stats
3) else (treated as space/tab separated text file, first column should contain read IDs)
generates 2 kind of ouputs (by default )
1) demultiplexed modbams (default output folder : in_folder/MODBAM)
2) demultiplexed per-read-modfication files (default output folder : in_folder/MOD)
general flags:
--ouputs | -w : can be modbam or per_read, can be specified multiple times re-entering -w (default: modbam AND per_read)
--mods | -m : modifications to analyze (mod available: C+m/5mc/5mC and C+h/5hmc/5hmC/) (default: C+m)
--threshold | -t : likelihood threshold for mod filtering, meaningful only for per_read output (default: 170)
--auto | -a : automatically detect library name, use only if filenames are in Filo's format and basecall_path is defined in config file.
input paths:
--modbam | -b : path to modbam
output paths:
--modbam_out | -o : path to store demultiplexed modbam files
--mod_out | -u : path to store per_read modification files
if --auto is set, CONFIG is used only to infer modbam input, basecall_path, fastq_path and merged_fastq_path should be defined in config file
if --force_config is set, CONFIG is used for output paths (--modbam_out, --mod_out) (note: if --auto is not set, you still need to provide an input path with --modbam flag)