You must be signed in to change notification settings - Fork 0
3) Commands
Filo_Puputnik edited this page Jul 4, 2024
6 revisions
prints filotools available commands
command list:
stats creates .stats file from filtered sam/bam files doc:YES config:NO
readl calculates read counts for each readlength bin (1bp) from aligments (bam/stat files) or .fastq files
motif extracts 5' 4-mer motifs
countmotif calculates 4-mer motif counts for each of the 256 motifs
dutyplot recreate MinKNOW-like duty time plots from pore_activity_*.csv files
mergefastq Merge fastq files, creating a log with a list of the files used for merging
bothbar Create .ids files based on single/double barcoding status
mod_dem demultiplex mod_mappings.bam files based on bam/stats files, or read_id files. generate per_read modification files
ichor Automatization of ichorCNA pipeline
subsetter Subset bams based on an ordered list of reads
creates .stats file from filtered sam/bam files
usage: filotools stats [options] <input_bams> [other options]
can also glob, example:
filotool stats *.bam
-o: specify output folder [default = input_path/STATS]
config file is ignored
-O: force output folder as stats_path from config file (for bam/stats)
calculates read counts for each readlength bin (1bp) from aligments (.stat files) or .fastq files
usage: filotools readl [options] <input_file(s)>
can also glob, example:
filotool readl *.bam
-o: specify output folder [default = fastq_path/READL_RAW for fastq/fastq.gz input files, stats_path/READLENGTH_COUNTS for bam/stats input files]
-t: [INT] number of threads for multiprocessing (default: 4)
-s: [CHAR] folder that contains .idf files for subsetting
filtering options (only for bam/stats)
-l: [CHAR] specify chr_list files (default: config)
-q: [INT] specify minimum mapping quality (default: 20)
-H: don't filter out hard clipped reads
-S: don't filter out soft clipped reads
Config file is used by default for:
-O: force output folder as readl_path (for bam/stats) and raw_readl_path (for fastq/fastq.gz)
-I: force input folder as stats_path (for bam/stats) and fastq_path (for fastq/fastq.gz)
-C: combination of -O and -I
extracts 5' 4-mer motifs
usage: filotools motif [options] <input_file(s)>
can also glob, example:
filotool motif *.bam
-r: [CHAR] specify reference genome
-o: specify output folder [ default = <stats_input_folder>/MOTIF ]
-t: [INT] number of threads for multiprocessing (default: 4)
Config file is used by default for:
-O: force output folder as motif_path
-I: force input folder as stats_path
-C: combination of -O and -I
generate raw counts for all the 256 4-mer 5' motifs
usage: filotools countmotif [options] <input_file(s)>
can also glob, example:
filotools countmotif *.bam
TYPICAL COMMAND LINE for double barcoded reads: filotools readl -t 30 -s BOTH_BARCODES *.stats
-o: [CHAR] specify output folder [default = <stats_input_folder>/MOTIF_COUNTS]
-m: [CHAR] specify input motif folder [default = <stats_input_folder>/MOTIF]
-t: [INT] number of threads for multiprocessing (default: 4)
-s: [CHAR] folder containing ids files for subsetting
filtering options (only for bam/stats)
-l: [CHAR] specify path for chr_list files (default: taken from config)
-q: [INT] specify minimum mapping quality (default: 20)
-k: [INT] specify max readlength (default: 700)
-H: don't filter out hard clipped reads
-S: don't filter out soft clipped reads
Config file is used by default for:
-O: force output folder as motif_count_path from config file
-I: force input folder as motif_path from config file
-C: combination of -O and -I
recreate MinKNOW-like duty time plots from pore_activity_*.csv files
filotools dutyplot -i <input_pore_activity.csv> [options]
-o: [CHAR] specify path for output file [default: input_pore_activity.pdf]
-b: [INT] specify number of runtime bins [default: 30]
-f: [INT] specify a fixed lenght (in minutes) for each runtime bin (overrides -b) [default: null]
-m: [INT] specify the start time of the plot (in minutes) [default: 0]
-M: [INT] specify the end time of the plot (in minutes) [default: total runtime]
-L: suppress legend
Merge fastq files, creating a log with a list of the files used for merging
IMPORTANT: all file/library names must follow standard filotools naming layout
usage: merge_fastq -m <output name> <space separated list of inputs>, or glob expression, i.e. PLB03-T0-*.fastq.gz
logs are stored in <output_folder>/logs_merge
-f: force overwrite of previous merged files with the same name in -m
-v: verbose mode
-u: output .fastq files instead of .fastq.gz
-o: specify output folder (otherwise default: <input_file_dir>/MERGED)
Usage: creates .ids files with read ids (first column) and info about barcoding status (second column): 0=single barcoded, 1=double barcoded.
if using samplesheet
filotools bothbar -i <stats_path> -s <samplesheet_path> -o <output_dir>
if not using samplesheet
IMPORTANT: all file/library names must follow standard filotools naming layout. Launch this tool from within the folder containing .bam or .stats files
filotools bothbar -p '*.stats' -F <fastq_path> -M <merged_fastq_path> -B <barcoding_path> -o <output_dir>
filotools bothbar -p '*.stats' -C -o <output_dir>
-o OUTPUT_DIR, --output_dir=OUTPUT_DIR
path to output folder (mandatory)
front and rear threshold for barcode detection, fomatted as <front_thr>:<rear_thr> [default: 60:60]
path to tab separated file containing <sample_id_without_extension> <BXX> <path_to_barcoding_summary.txt> where XX is the two digits number of the barcode. if a sample is a result of a merging of samples from different runs, provide each barcode/barcoding summary pair in a separate line referencing to the same sample_id
-i STATS_PATH, --stats_path=STATS_PATH
path to folder containing .stats files (mandatory if using samplesheet, can be inherited from config_file using -C)
-p PATTERN, --pattern=PATTERN
pattern to retrieve samples from working directory. IMPORTANT: launch this tool from within the folder containing .bam or .stats files (used only if samplesheet is not provided). [default= '*.stats']
-F FASTQ_PATH, --fastq_path=FASTQ_PATH
path in which fastq files are stored (used only if samplesheet is not provided)
path in which merged fastq files are stored (used only if samplesheet is not provided)
path in which barcoded results are stored. this path should include one subfolder for each run with filo's naming layout LXX_FATXXXXX_countX. (used only if samplesheet is not provided)
path from barcoding_path subfolders to barcoding_summary.txt file [default: GUPPY_DEM_EITHER/barcoding_summary.txt]. (used only if samplesheet is not provided)
-C, --force_config
forces the use of fastq_path/barcoding_path/merged_fastq_path (only if samplesheet is not provided) and stats_path (always) from config file
-h, --help
Show this help message and exit
standard mode (all stats files must belong to the same run and modbam file)
filotools mod_dem -m 5mC -m 5hmC -b <path/to/modbam.bam> -o <path/to/store/demultiplexed_output_modbam/> -u <path/to/store/per_read_output/> *.stats
auto mode (can be launched on stats files from multiple runs, as far as everything is named according to filotools standard naming layout)
filotools mod_dem -m 5mC -m 5hmC -a -o <path/to/store/demultiplexed_output_modbam/> -u <path/to/store/per_read_output/> *.stats
auto mode with config
filotools mod_dem -m 5mC -m 5hmC -a -C *.stats
can work with 3 kind of inputs (inferred from input extension)
1) .sam/.bam
2) .stats
3) else (treated as space/tab separated text file, first column should contain read IDs)
generates 2 kind of ouputs (by default )
1) demultiplexed modbams (default output folder : in_folder/MODBAM)
2) demultiplexed per-read-modfication files (default output folder : in_folder/MOD)
general flags:
--ouputs | -w : can be modbam or per_read, can be specified multiple times re-entering -w (default: modbam AND per_read)
--mods | -m : modifications to analyze (mod available: C+m/5mc/5mC and C+h/5hmc/5hmC/) (default: C+m)
--threshold | -t : likelihood threshold for mod filtering, meaningful only for per_read output (default: 170)
--auto | -a : automatically detect library name, use only if filenames are in Filo's format and basecall_path is defined in config file.
input paths:
--modbam | -b : path to modbam
output paths:
--modbam_out | -o : path to store demultiplexed modbam files
--mod_out | -u : path to store per_read modification files
if --auto is set, CONFIG is used only to infer modbam input, basecall_path, fastq_path and merged_fastq_path should be defined in config file
if --force_config is set, CONFIG is used for output paths (--modbam_out, --mod_out) (note: if --auto is not set, you still need to provide an input path with --modbam flag)