Collapse sequence homopolymers to a single character
% cat test/test.fq
@SRR2288572.2 /1
GAATTTCCCC
+
"!!"!"!!!!
@SRR2288572.2 /1
TCGTGTTTTCTTTTTCTTTT
+
"!!"!"!!!!!!!!!!!!!!
% dehomopolymerate test/test.fq
@SRR2288572.2
GATC
+
"!"!
@SRR2288572.2
TCGTGTCTCT
+
"!!"!"!!!!
INPUT : seqs=2 bp=30 avglen=15
OUTPUT : seqs=2 bp=14 avglen=7
brew install brewsci/bio/dehomopolymerate
brew install -c bioconda dehomopolymerate
dehomopolymerate
is written in C to the C99 standard
and only depends on gcc
and libz
.
git clone https://github.com/tseemann/dehomopolymerate.git
cd dehomopolymerate
make
make install PREFIX=$HOME/bin
% dehomopolymerate -h
SYNOPSIS
Collapse sequence homopolymers to a single character
USAGE
dehomopolymerate [options] reads.fast{aq}[.gz] > nohomop.fq
OPTIONS
-h Show this help
-v Print version and exit
-q Quiet mode; not non-error output
-f Output FASTA not FASTQ
-w Output RAW one line per sequence
-l LEN Discard output sequences shorter then L bp
URL
https://github.com/tseemann/dehomopolymerate (Torsten Seemann)
Prints the name and version separated by a space in standard Unix fashion.
% dehomopolymerate -v
dehomopolymerate 0.3
Don't print informational messages, only errors.
% dehomopolymerate -f test/test.fq.gz
>SRR2288572.2
GATC
>SRR2288572.2
TCGTGTCTCT
% dehomopolymerate -w test/test.fq.gz
GATC
TCGTGTCTCT
% dehomopolymerate -l 6 -f test/test.fq.gz
>SRR2288572.2
TCGTGTCTCT
File concerns at the Issue Tracker
- Torsten Seemann