Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated FASTK tool wrapper #5965

Merged
merged 8 commits into from
May 3, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions tools/fastk/.shed.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
name: fastk
owner: iuc
categories:
SaimMomin12 marked this conversation as resolved.
Show resolved Hide resolved
- Assembly
description: "FastK: A K-mer counter (for HQ assembly data sets)"
long_description: FastK is a k‑mer counter that is optimized for processing high-quality DNA assembly data sets such as those produced with an Illumina instrument or a PacBio run in HiFi mode.
homepage_url: https://github.com/thegenemyers/FASTK
remote_repository_url: https://github.com/galaxyproject/tools-iuc/tree/master/tools/fastk
type: unrestricted
suite:
SaimMomin12 marked this conversation as resolved.
Show resolved Hide resolved
name: "suite_fastk"
description: "A suite of tools for FASTK in Galaxy"
123 changes: 123 additions & 0 deletions tools/fastk/fastk.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
<tool id="fastk" name="FastK" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="23.2">
<description>A k-mer counter for high-quality assembly datasets</description>
<macros>
<import>macros.xml</import>
</macros>
<expand macro="requirements" />
<command detect_errors="exit_code"><![CDATA[
mkdir -p outfiles/tmpfiles &&
#if $infile.ext == "fasta":
ln -s '$infile' ./input.fasta &&
#set INPUTFILE="input.fasta"
#elif $infile.ext == "fasta.gz":
ln -s '$infile' ./input.fasta.gz &&
#set INPUTFILE="input.fasta.gz"
#elif $infile.is_of_type("fastq"):
ln -s '$infile' ./input.fastq &&
#set INPUTFILE="input.fastq"
#elif $infile.is_of_type("fastq.gz"):
ln -s '$infile' ./input.fastq.gz &&
#set INPUTFILE="input.fastq.gz"
#elif $infile.ext == "cram":
ln -s '$infile' ./input.cram &&
#set INPUTFILE="input.cram"
#elif $infile.is_of_type("unsorted.bam"):
ln -s '$infile' ./input.bam &&
#set INPUTFILE="input.bam"
#elif $infile.ext == "sam":
ln -s '$infile' ./input.sam &&
#set INPUTFILE="input.sam"
#end if
SaimMomin12 marked this conversation as resolved.
Show resolved Hide resolved
FastK $INPUTFILE
-k$kmer_size
#if $sorted_table.sorted_table_option == 'yes_with_default':
-t
#elif $sorted_table.sorted_table_option == 'yes_with_custom':
-t$sorted_table_cutoff
#end if
-T\${GALAXY_SLOTS:-8} -Noutfiles/output -Poutfiles/tmpfiles
#if $sorted_table.sorted_table_option == 'yes_with_default':
&& Tabex outfiles/output.ktab -t${sorted_table.tabex_threshold_for_default} LIST > '$tabex_hist'
#elif $sorted_table.sorted_table_option == 'yes_with_custom':
&& Tabex outfiles/output.ktab -t${sorted_table.tabex_threshold_for_custom} LIST > '$tabex_hist'
#end if
SaimMomin12 marked this conversation as resolved.
Show resolved Hide resolved
&& tar -c -f fastk.tar ./outfiles/
]]></command>
<inputs>
<param name="infile" type="data" format="fasta,fasta.gz,fastq,fastq.gz,cram,unsorted.bam,sam" label="Input file"/>
<param name="kmer_size" type="integer" min="5" max="50" value="40" label="Enter desired k-mer size" help="Default: 40" />
<conditional name="sorted_table">
<param name="sorted_table_option" type="select" label="Sort table" help="Do you want a sorted table of all canonical k-mers and their counts? The sorted table is sorted lexicographically on the k-mer where a &lt; c &lt; g &lt; t.">
<option value="no">No</option>
<option value="yes_with_default">Yes, Default sorted </option>
<option value="yes_with_custom">Yes, Custom sorted </option>
</param>
<when value="no"/>
<when value="yes_with_default">
<param name="tabex_threshold_for_default" argument="-t" type="integer" value="5" min="1" label="Tabex count threshold" help="Trim all k-mers with counts less than threshold"/>
</when>
<when value="yes_with_custom">
<param name="sorted_table_cutoff" type="integer" min="2" value="10" label="Enter sorted table cutoff value"/>
<param name="tabex_threshold_for_custom" argument="-t" type="integer" value="5" min="1" label="Tabex count threshold" help="Trim all k-mers with counts less than threshold"/>
</when>
</conditional>
</inputs>
<outputs>
<data name="fastk_out" format="tar" from_work_dir="fastk.tar" label="${tool.name} on ${on_string}: FastK files"/>
<data name="fastk_hist_out" format="fastk_hist" from_work_dir="outfiles/output.hist" label="${tool.name} on ${on_string}: FastK hist" />
<data name="tabex_hist" format="txt" label="${tool.name} on ${on_string}: Tabex output">
<filter> ( sorted_table['sorted_table_option'] == 'yes_with_default' or sorted_table['sorted_table_option'] == 'yes_with_custom' )</filter>
SaimMomin12 marked this conversation as resolved.
Show resolved Hide resolved
</data>
</outputs>
<tests>
<!-- TEST 1 -->
<test expect_num_outputs="2">
<param name="infile" value="input01.fasta.gz"/>
<output name="fastk_out" ftype="tar">
<assert_contents>
<has_archive_member path="./outfiles/output.hist"/>
</assert_contents>
</output>
</test>
<!-- TEST 2 -->
<test expect_num_outputs="3">
<param name="infile" value="input01.fasta.gz"/>
<conditional name="sorted_table">
<param name="sorted_table_option" value="yes_with_default"/>
</conditional>
<output name="fastk_out" ftype="tar">
<assert_contents>
<has_archive_member path="./outfiles/output.hist"/>
<has_archive_member path="./outfiles/output.ktab"/>
</assert_contents>
</output>
<output name="tabex_hist" value="test02.tabex.txt"/>
</test>
<!-- TEST 3 -->
<test expect_num_outputs="3">
<param name="infile" value="input01.fasta.gz"/>
<conditional name="sorted_table">
<param name="sorted_table_option" value="yes_with_custom"/>
<param name="sorted_table_cutoff" value="5"/>
</conditional>
<output name="fastk_out" ftype="tar">
<assert_contents>
<has_archive_member path="./outfiles/output.hist"/>
<has_archive_member path="./outfiles/output.ktab"/>
</assert_contents>
</output>
<output name="tabex_hist" value="test03.tabex.txt"/>
</test>
</tests>
<help><![CDATA[
FastK is a k‑mer counter that is optimized for processing high quality DNA assembly data sets such as those produced with an Illumina instrument or a PacBio run in HiFi mode.

FastK can produce the following outputs:

1. a histogram of the frequency with which each k‑mer in the data set occurs.
SaimMomin12 marked this conversation as resolved.
Show resolved Hide resolved
SaimMomin12 marked this conversation as resolved.
Show resolved Hide resolved
2. a table of k‑mer/count pairs sorted lexicographically on the k‑mer where a < c < g < t.
3. a k‑mer count profile of every sequence in the data set. A profile is the sequence of counts of the n-(k-1) consecutive k‑mers of a sequence of length n.
4. a relative profile of every sequence in the data set against a FastK table produced for another data set.
]]></help>
<expand macro="citations"/>
</tool>
23 changes: 23 additions & 0 deletions tools/fastk/macros.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
<macros>
<token name="@TOOL_VERSION@">1.0.0</token>
<token name="@VERSION_SUFFIX@">0</token>
<xml name="requirements">
<requirements>
<requirement type="package" version="@TOOL_VERSION@">fastk</requirement>
</requirements>
</xml>
<xml name="citations">
<citations>
<citation type="bibtex">
@misc{github,
author = {Gene Meyers},
year = {2020},
title = {FastK},
publisher = {GitHub},
journal = {GitHub repository},
url = {https://github.com/thegenemyers/FASTK},
}
</citation>
</citations>
</xml>
</macros>
Binary file added tools/fastk/test-data/input01.fasta.gz
Binary file not shown.
Loading