withncbi - egaz and alignDB work with external (NCBI/EBI) data.
Fetch sequences, generate reports and build alignments according to various NCBI databases.
For more detailed, check README.md
in each sub-directories.
-
db/
: turn NCBI genome reports and assembly reports into a query-able MySQL database. -
ensembl/
: Ensembl related scripts. -
misc/
: miscellaneous projects. -
pop/
: build alignments on an whole Eukaryotes genus. -
taxon/
: process (small) genomes according to NCBI Taxonomy. -
util/
: miscellaneous utilities.
.fa
- genomic sequences.fas
- blocked fasta files.fasta
- normal/miscellaneous fasta files
Use .fq
over .fastq
An IntSpan represents sets of integers as a number of inclusive ranges, for example '1-10,19,45-48'.
The following picture is the schema of an IntSpan object. Jump lines are above the baseline; loop lines are below it.
AlignDB::IntSpan and jintspan are implements of IntSpan objects in Perl and Java, respectively.
Examples in S288c.txt
I:1-100
I(+):90-150
S288c.I(-):190-200
II:21294-22075
II:23537-24097
Simple rules:
chromosome
andstart
are requiredspecies
,strand
andend
are optional.
to separatespecies
andchromosome
strand
is one of+
and-
and surround by round brackets:
to separate names and digits-
to separatestart
andend
- names should be alphanumeric and without spaces
species.chromosome(strand):start-end
--------^^^^^^^^^^--------^^^^^^----
Examples in example.fas
>S288c.I(+):13267-13287|species=S288c
TCGTCAGTTGGTTGACCATTA
>YJM789.gi_151941327(-):5668-5688|species=YJM789
TCGTCAGTTGGTTGACCATTA
>RM11.gi_61385832(-):5590-5610|species=RM11
TCGTCAGTTGGTTGACCATTA
>Spar.gi_29362400(+):2477-2497|species=Spar
TCATCAGTTGGCAAACCGTTA
Qiang Wang <[email protected]>
This software is copyright (c) 2015 by Qiang Wang.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.