Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated FASTK tool wrapper #5965

Merged
merged 8 commits into from
May 3, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 17 additions & 26 deletions tools/fastk/fastk.xml
Original file line number Diff line number Diff line change
Expand Up @@ -6,28 +6,14 @@
<expand macro="requirements" />
<command detect_errors="exit_code"><![CDATA[
mkdir -p outfiles/tmpfiles &&
#if $infile.ext == "fasta":
ln -s '$infile' ./input.fasta &&
#set INPUTFILE="input.fasta"
#elif $infile.ext == "fasta.gz":
ln -s '$infile' ./input.fasta.gz &&
#set INPUTFILE="input.fasta.gz"
#elif $infile.is_of_type("fastq"):
ln -s '$infile' ./input.fastq &&
#if $infile.is_of_type("fastq"):
#set INPUTFILE="input.fastq"
#elif $infile.is_of_type("fastq.gz"):
ln -s '$infile' ./input.fastq.gz &&
#set INPUTFILE="input.fastq.gz"
#elif $infile.ext == "cram":
ln -s '$infile' ./input.cram &&
#set INPUTFILE="input.cram"
#elif $infile.is_of_type("unsorted.bam"):
ln -s '$infile' ./input.bam &&
#set INPUTFILE="input.bam"
#elif $infile.ext == "sam":
ln -s '$infile' ./input.sam &&
#set INPUTFILE="input.sam"
#end if
#else
#set INPUTFILE="input."+$infile.ext
#end if
ln -s '$infile' $INPUTFILE &&
FastK $INPUTFILE
-k$kmer_size
#if $sorted_table.sorted_table_option == 'yes_with_default':
Expand All @@ -45,7 +31,7 @@
]]></command>
<inputs>
<param name="infile" type="data" format="fasta,fasta.gz,fastq,fastq.gz,cram,unsorted.bam,sam" label="Input file"/>
<param name="kmer_size" type="integer" min="5" max="50" value="40" label="Enter desired k-mer size" help="Default: 40" />
<param name="kmer_size" argument="-k" type="integer" min="5" max="50" value="40" label="Enter desired k-mer size" help="Default: 40" />
<conditional name="sorted_table">
<param name="sorted_table_option" type="select" label="Sort table" help="Do you want a sorted table of all canonical k-mers and their counts? The sorted table is sorted lexicographically on the k-mer where a &lt; c &lt; g &lt; t.">
<option value="no">No</option>
Expand All @@ -66,7 +52,7 @@
<data name="fastk_out" format="tar" from_work_dir="fastk.tar" label="${tool.name} on ${on_string}: FastK files"/>
<data name="fastk_hist_out" format="fastk_hist" from_work_dir="outfiles/output.hist" label="${tool.name} on ${on_string}: FastK hist" />
<data name="tabex_hist" format="txt" label="${tool.name} on ${on_string}: Tabex output">
<filter> ( sorted_table['sorted_table_option'] == 'yes_with_default' or sorted_table['sorted_table_option'] == 'yes_with_custom' )</filter>
<filter> sorted_table['sorted_table_option'] != 'no' </filter>
</data>
</outputs>
<tests>
Expand Down Expand Up @@ -111,13 +97,18 @@
</tests>
<help><![CDATA[
FastK is a k‑mer counter that is optimized for processing high quality DNA assembly data sets such as those produced with an Illumina instrument or a PacBio run in HiFi mode.

The input data can be in CRAM, BAM, SAM, fasta, or fastq files.

FastK can produce the following outputs:
FastK produces the following outputs:

1. A .hist file in binary format containing histogram information detailing the frequency of occurrence for each k‑mer within the dataset.
bgruening marked this conversation as resolved.
Show resolved Hide resolved

2. A Tabex txt file comprising a table of k‑mer/count pairs, sorted lexicographically on the k‑mer sequence, followingthe order a < c < g < t

3. A tar file comprising of hidden .ktab files that can be used for downstream FASTK tools.


1. a histogram of the frequency with which each k‑mer in the data set occurs.
2. a table of k‑mer/count pairs sorted lexicographically on the k‑mer where a < c < g < t.
3. a k‑mer count profile of every sequence in the data set. A profile is the sequence of counts of the n-(k-1) consecutive k‑mers of a sequence of length n.
4. a relative profile of every sequence in the data set against a FastK table produced for another data set.
]]></help>
<expand macro="citations"/>
</tool>