Skip to content

3) Commands

Filo_Puputnik edited this page Jul 4, 2024 · 6 revisions

filotools -h

prints filotools available commands

 WELCOME TO FILOTOOLS!

	command list:

	stats		creates .stats file from filtered sam/bam files	doc:YES	config:NO
	readl		calculates read counts for each readlength bin (1bp) from aligments (bam/stat files) or .fastq files
	motif		extracts 5' 4-mer motifs  
	countmotif	calculates 4-mer motif counts for each of the 256 motifs
	dutyplot	recreate MinKNOW-like duty time plots from  pore_activity_*.csv files
	mergefastq	Merge fastq files, creating a log with a list of the files used for merging
	bothbar		Create .ids files based on single/double barcoding status
	mod_dem		demultiplex mod_mappings.bam files based on bam/stats files, or read_id files. generate per_read modification files
	ichor		Automatization of ichorCNA pipeline
	subsetter	Subset bams based on an ordered list of reads

filotools stats

creates .stats file from filtered sam/bam files
	   usage: filotools stats [options] <input_bams> [other options]
	   can also glob, example:
	   filotool stats *.bam 
options:
           -o: specify output folder [default = input_path/STATS]


config file is ignored

-O: force output folder as stats_path from config file (for bam/stats)

filotools readl

calculates read counts for each readlength bin (1bp) from aligments (.stat files) or .fastq files
	   usage: filotools readl [options] <input_file(s)> 
	   can also glob, example:
	   filotool readl *.bam 
options:
		-o: specify output folder [default = fastq_path/READL_RAW for fastq/fastq.gz input files, stats_path/READLENGTH_COUNTS for bam/stats input files]
		-t: [INT] number of threads for multiprocessing (default: 4)
		-s: [CHAR] folder that contains .idf files for subsetting 

		filtering options (only for bam/stats)
		-l: [CHAR] specify chr_list files (default: config)
		-q: [INT] specify minimum mapping quality (default: 20)
		-H: don't filter out hard clipped reads
		-S: don't filter out soft clipped reads

Config file is used by default for:

chr_list

-O: force output folder as readl_path (for bam/stats) and raw_readl_path (for fastq/fastq.gz)
-I: force input  folder as stats_path (for bam/stats) and fastq_path (for fastq/fastq.gz)
-C: combination of -O and -I

filotools motif

extracts 5' 4-mer motifs
	   usage: filotools motif [options] <input_file(s)> 
	   can also glob, example:
	   filotool motif *.bam 
options:
		-r: [CHAR] specify reference genome 
		-o: specify output folder [ default = <stats_input_folder>/MOTIF ]
		-t: [INT] number of threads for multiprocessing (default: 4)

Config file is used by default for:

reference

-O: force output folder as motif_path 
-I: force input  folder as stats_path 
-C: combination of -O and -I

filotools countmotif

generate raw counts for all the 256 4-mer 5' motifs
	   usage: filotools countmotif [options] <input_file(s)> 
	   can also glob, example:
	   filotools countmotif *.bam 
TYPICAL COMMAND LINE for double barcoded reads: filotools readl -t 30 -s BOTH_BARCODES *.stats
options:
		-o: [CHAR] specify output folder [default = <stats_input_folder>/MOTIF_COUNTS]
		-m: [CHAR] specify input motif folder [default = <stats_input_folder>/MOTIF]
		-t: [INT] number of threads for multiprocessing (default: 4)
		-s: [CHAR] folder containing ids files for subsetting 

		filtering options (only for bam/stats)
		-l: [CHAR] specify path for chr_list files (default: taken from config)
		-q: [INT] specify minimum mapping quality (default: 20)
		-k: [INT] specify max readlength (default: 700)
		-H: don't filter out hard clipped reads
		-S: don't filter out soft clipped reads

Config file is used by default for:

chr_list

-O: force output folder as motif_count_path from config file
-I: force input  folder as motif_path from config file
-C: combination of -O and -I

filotools dutyplot

        recreate MinKNOW-like duty time plots from  pore_activity_*.csv files
        
        usage:
        filotools dutyplot -i <input_pore_activity.csv> [options]
        
        options:
        
        -o: [CHAR] specify path for output file [default: input_pore_activity.pdf]
        -b: [INT]  specify number of runtime bins [default: 30]
        -f: [INT]  specify a fixed lenght (in minutes) for each runtime bin (overrides -b) [default: null]
        -m: [INT]  specify the start time of the plot (in minutes) [default: 0]
        -M: [INT]  specify the end time of the plot (in minutes) [default: total runtime]
        -L:        suppress legend

filotools mergefastq

Merge fastq files, creating a log with a list of the files used for merging

IMPORTANT: all file/library names must follow standard filotools naming layout 
usage:  merge_fastq -m <output name> <space separated list of inputs>, or glob expression, i.e.  PLB03-T0-*.fastq.gz

logs are stored in <output_folder>/logs_merge

flags:
	   -f: force overwrite of previous merged files with the same name in -m
	   -v: verbose mode
	   -u: output .fastq files instead of .fastq.gz
	   -o: specify output folder (otherwise default: <input_file_dir>/MERGED)

filotools bothbar

Usage: creates .ids files with read ids (first column) and info about barcoding status (second column): 0=single barcoded, 1=double barcoded.

if using samplesheet

filotools bothbar -i <stats_path> -s <samplesheet_path> -o <output_dir>


if not using samplesheet
IMPORTANT: all file/library names must follow standard filotools naming layout. Launch this tool from within the folder containing .bam or .stats files

filotools bothbar -p '*.stats' -F <fastq_path> -M <merged_fastq_path> -B <barcoding_path> -o <output_dir>
filotools bothbar -p '*.stats' -C -o <output_dir>





Options:
	-o OUTPUT_DIR, --output_dir=OUTPUT_DIR
		path to output folder (mandatory)

	-f FILTER_TRESHOLDS, --filter_tresholds=FILTER_TRESHOLDS
		front and rear threshold for barcode detection, fomatted as <front_thr>:<rear_thr> [default: 60:60]

	-s SAMPLESHEET, --samplesheet=SAMPLESHEET
		path to tab separated file containing <sample_id_without_extension>	<BXX>	<path_to_barcoding_summary.txt> where XX is the two digits number of the barcode. if a sample is a result of a merging of samples from different runs, provide each barcode/barcoding summary pair in a separate line referencing to the same sample_id

	-i STATS_PATH, --stats_path=STATS_PATH
		path to folder containing .stats files (mandatory if using samplesheet, can be inherited from config_file using -C)

	-p PATTERN, --pattern=PATTERN
		pattern to retrieve samples from working directory. IMPORTANT: launch this tool from within the folder containing .bam or .stats files (used only if samplesheet is not provided). [default= '*.stats']

	-F FASTQ_PATH, --fastq_path=FASTQ_PATH
		path in which fastq files are stored (used only if samplesheet is not provided)

	-M MERGED_FASTQ_PATH, --merged_fastq_path=MERGED_FASTQ_PATH
		path in which merged fastq files are stored (used only if samplesheet is not provided)

	-B BARCODING_PATH, --barcoding_path=BARCODING_PATH
		path in which barcoded results are stored. this path should include one subfolder for each run with filo's naming layout LXX_FATXXXXX_countX. (used only if samplesheet is not provided)

	-S BARCODING_SUFFIX, --barcoding_suffix=BARCODING_SUFFIX
		path from barcoding_path subfolders to barcoding_summary.txt file [default: GUPPY_DEM_EITHER/barcoding_summary.txt]. (used only if samplesheet is not provided)

	-C, --force_config
		forces the use of fastq_path/barcoding_path/merged_fastq_path (only if samplesheet is not provided) and stats_path (always) from config file 

	-h, --help
		Show this help message and exit

filotools mod_dem

			USAGE:

			standard mode (all stats files must belong to the same run and modbam file)
			filotools mod_dem -m 5mC -m 5hmC -b <path/to/modbam.bam> -o <path/to/store/demultiplexed_output_modbam/> -u <path/to/store/per_read_output/>  *.stats

			auto mode (can be launched on stats files from multiple runs, as far as everything is named according to filotools standard naming layout)
			filotools mod_dem -m 5mC -m 5hmC -a -o <path/to/store/demultiplexed_output_modbam/> -u <path/to/store/per_read_output/>  *.stats
			auto mode with config
			filotools mod_dem -m 5mC -m 5hmC -a -C  *.stats



			can work with 3 kind of inputs (inferred from input extension)
		    1) .sam/.bam
			2) .stats
			3) else (treated as space/tab separated text file, first column should contain read IDs)
			
			generates 2 kind of ouputs (by default )
			1) demultiplexed modbams                    (default output folder : in_folder/MODBAM)
			2) demultiplexed per-read-modfication files (default output folder : in_folder/MOD)
			
			general flags:
			--ouputs     | -w : can be modbam or per_read, can be specified multiple times re-entering -w (default: modbam AND per_read)
			--mods       | -m : modifications to analyze (mod available: C+m/5mc/5mC and C+h/5hmc/5hmC/) (default: C+m)
			--threshold  | -t : likelihood threshold for mod filtering, meaningful only for per_read output (default: 170)
			--auto       | -a : automatically detect library name, use only if filenames are in Filo's format and basecall_path is defined in config file. 
			
			input paths:
			--modbam     | -b : path to modbam  
			
			output paths:
			--modbam_out | -o : path to store demultiplexed modbam files
			--mod_out    | -u : path to store per_read modification files
			
			if --auto is set, CONFIG is used only to infer modbam input, basecall_path, fastq_path and merged_fastq_path should be defined in config file
			if --force_config is set, CONFIG is used for output paths (--modbam_out, --mod_out) (note: if --auto is not set, you still need to provide an input path with --modbam flag)