TRACESPipeLite is the reference-based version of TRACESPipe. TRACESPipeLite is oriented for human viral genome classification, specifically to apply directly in clinical virology backgrounds. It provides the ultra-fast reconstruction of viral genomes while automatically generating consensus sequences, breadth and depth coverage values and profiles, and others. The viral database is freely provided. Assembly is exclusively reference-based and can be used in the provided database, but not limited.
To install TRACESPipeLite, please run
git clone https://github.com/viromelab/TRACESPipeLite.git
cd TRACESPipeLite/src/
lzma -d VDB.fa.lzma
chmod +x *.sh
./TRACESPipeLite.sh --install
An extensive and high-quality database is fundamental for high accuracy in metagenomic classification. TRACESPipeLite already includes a database at the src directory, with the name VDB.fa.lzma. Nevertheless, the following procedure includes instructions on how to create and check the quality of a specific database with the following 57 viruses and Human mitogenome:
B19V Human Parvovirus
BuV Bufa virus
CuV Cuta virus
HBoV Boca virus type 1
AAV Adeno-Associated Virus
BKPyV Human polyomavirus 1
JCPyV Human polyomavirus 2
KIPyV Human polyomavirus 3
WUPyV Human polyomavirus 4
MCPyV Human polyomavirus 5
HPyV6 Human polyomavirus 6
HPyV7 Human polyomavirus 7
TSPyV Human polyomavirus 8
HPyV9 Human polyomavirus 9
MWPyV Human polyomavirus 10
STLPyV Human polyomavirus 11
HPyV12 Human polyomavirus 12
NJPyV Human polyomavirus 13
LIPyV Human polyomavirus 14
SV40 Simian virus 40
TTV Torque Teno Virus
TTVmin Torque Teno Mini Virus
TTVmid Torque Teno Midi Virus
HAV Hepatitis A Virus
HBV Hepatitis B Virus
HCV Hepatitis C Virus
HDV Hepatitis D Virus
HEV Hepatitis E Virus
SENV SEN virus
HPV2 Human papillomavirus 2
HPV6 Human papillomavirus 6
HPV11 Human papillomavirus 11
HPV16 Human papillomavirus 16
HPV18 Human papillomavirus 18
HPV31 Human papillomavirus 31
HPV39 Human papillomavirus 39
HPV45 Human papillomavirus 45
HPV51 Human papillomavirus 51
HPV56 Human papillomavirus 56
HPV58 Human papillomavirus 58
HPV59 Human papillomavirus 59
HPV68 Human papillomavirus 68
HPV77 Human papillomavirus 77
HSV-1 Herpes-Simplex Virus 1
HSV-2 Herpes-Simplex Virus 2
VZV Varicela-Zoster Virus
EBV Epstein-Barr virus
HCMV Human Cytomegalovirus
HHV6 Human Herpesvirus 6
HHV7 Human Herpesvirus 7
KSHV Kaposi's Sarcoma-associated Herpesvirus
ReDoV Redondovirus
VARV Variola Virus
MPXV Monkeypox Virus
EV Enterovirus
SARS2 SARS-CoV-2
HERV Human Endogenous Retrovirus
MT Human Mitogenome
To download and build the database, and create the PDF with the respective quality controls, for each virus, there will be the need to create a file in IDS/ folder as
ID-<virus_name>.ids
containing the IDs (one for each line) for the respective virus type.
After, there will be the need to run:
./TRACESCreateDB.sh
./TRACESAnalyzeDB.sh
./TRACESMergeDB.sh
The output will be available as
FIL-DB-<virus_name>.mfa
Composition-<virus_name>.pdf
The composition virus PDF will provide information if a genome contains unusual characteristics (higher or lower number of bases, external chars, or different GC content). The whole FASTA database will be available as
VDB.mfa
Finally, to set this viral database as default, type:
mv VDB.mfa VDB.fa
The TRACESPipeLite package includes viral reads for a quick demonstration. To run the example, please, first, install the tool. Then, run the following command
./TRACESPipeLite.sh --threads 8 --reads1 reads_forward.fq.gz \
--reads2 reads_forward.fq.gz --database VDB.fa --output test_viral_analysis
Also, it includes a simple simulation and benchmark
bash Simple.sh
./TRACESPipeLite.sh --specific SYN.fa --database VDB.mfa --reads1 blood1.fq --reads2 blood2.fq --threads 8 --output Simple-test-src
To see the option of TRACESPipeLite, please run the following command
./TRACESPipeLite.sh -h
This command will output the following content
--------------------------------------------------------
TRACESPipeLite.sh : TRACESPipe lite version v2.2
This is a lite version of TRACESPipe. It provides
automatic reconstruction (reference-based only) of
viral genomes and performs basic analyses.
Program options ----------------------------------------
-h, --help Show this,
-i, --install Installation (w/ conda),
-np, --no-plots NO coverage plots,
-lg <INT>, --log-scale <INT> Coverage log scale,
-ws <INT>, --window <INT> Coverage window size,
-dr <INT>, --drop <INT> Coverage drop size,
-cx <INT>, --start <INT> Coverage start x-axis,
-ma <INT>, --maximum <INT> Coverage maximum (crop),
-si <INT>, --similarity <DBL> Minimum similarity for
applying reconstruction,
-s <STR>, --specific <STR> Use specific sequence,
-os, --only-specific Run only specific,
-fh, --filter-human Filter human DNA,
-t <INT>, --threads <INT> Number of threads,
-o <STR>, --output <STR> Output folder name,
-r1 <STR>, --reads1 <STR> FASTQ reads (forward),
-r2 <STR>, --reads2 <STR> FASTQ reads (reverse),
-db <STR>, --database <STR> FASTA Viral Database.
Example -----------------------------------------------
TRACESPipeLite.sh --reads1 reads_forward.fq.gz \
--reads2 reads_reverse.fq.gz --database VDB.mfa \
--output lite_viral_analysis --threads 8
-------------------------------------------------------
Inclusion of the following viruses:
MEV Measles morbillivirus
Please, cite:
Pratas, D., Toppinen, M., Pyöriä, L., Hedman, K., Sajantila, A. and Perdomo, M.F., 2020.
A hybrid pipeline for reconstruction and analysis of viral genomes at multi-organ level.
GigaScience, 9(8), p.giaa086.
GPL v3.
For more information see LICENSE file or visit
http://www.gnu.org/licenses/gpl-3.0.html