Skip to content

Scripts used for calculating chromosomal repeat statistics and for generating EHEC figures

License

Notifications You must be signed in to change notification settings

eliotstanton/EHEC

Repository files navigation

EHEC genome synteny and repeat sequence analysis/visualization

These scripts are associated with publications:

Bash scripts for generating information and figures in publications are located in the open (detailed at end). FASTA, FASTQ, GenBank, GFF, XML (omptical maps), and genome feature files needed to run these are all included in separate directories. Main scripts are located in /scripts directory. The two major scripts used are HomologyAnalyzer.pl and Synteny.pl.

  • HomologyAnalyzer.pl caluclates the extent of repeats in a given FASTA sequence and classifies overlap with different classes of chromosomal MGE based upon user-provided genomic coordinates.
  • Synteny.pl functions to visualize alignment calculated by progressiveMauve using Circos. Different classes of MGE are highlighted based upon user-provided chromosomal coordinates.

Examples of the figures created with these scripts:

Output of Synteny.pl (additional editing using Adobe Illustrator). Comparison of FRIK804 and Sakai chromosomes. Alignment of the FRIK804 (outer) and Sakai (inner) chromosome found disruption of synteny by large-scale structural alterations.

Outputs of HomologyAnalyzer.pl (additional editing using Adobe Illustrator). Locations of direct and inverted contiguous repeats ≥75 bp in length in the chromosomes of FRIK804 and E. coli strain MG1655.

Output of Optical.pl (additional editing using Adobe Illustrator). NcoI restriction site maps of the chromosomes from FRIK804 (outer), FRIK1275 (middle), and FRIK1625 (inner).

Output of AlignReads.pl, GrabORFs.pl, and WriteORFs.pl (additional editing using Adobe Illustrator). Detection of the boundaries of inter-prophage deletion (indel-4) in Φ804–9/Φ804–10 present in FRIK1275 and FRIK1625.

Main scripts

HomologyAnalyzer.pl

Master script used for caluclating and visualising homology within a circular bacterial genome.

HomologyAnalyzer.pl [FASTA] [Features]
-d Prohibit direct links from being drawn
-i Prohibit inverted links from being drawn
-m Minimum repeat length (default: 100)
-n nmer length for homology (default: 20 bp)
-o OUTPUT DIRECTORY (required)
-s Output files prefix (default: default)

Feature file format:  
seqID	feature_type	feature_name	start	end  
example:  
0	prophage	prophage0	103894	163432  

Synteny.pl

Visualising chromosomal alignment of related strains using Mauve and Circos.

Synteny.pl [OPTIONS] [FASTA] [Features]
-m Minimum length for region alignment (default: 100)
-o Output directory (required)
-p Force progressiveMauve to run
-s Output files prefix (default: default)

Feature file format:  
seqID	feature_type	feature_name	start	end  
example:  
0	prophage	prophage0	103894	163432  

AlignReads.pl

Used to align reads to a FASTA sequence or sequences using Bowtie.

AlignReads.pl [FASTA] [FASTQ1],[FASTQ2]
-c Scaling factor (default: 10)
-o Output directory (required)
-s Output files prefix (default: default)
-t Start coordinate (default: 1)
-u End coordinate (default: end of sequence)

GrabORFs.pl

Used to pull ORF locations from a GFF file

TODO: Document fully

HomologyLite.pl

Used for determining homology shared between one or more short FASTA sequences

HomologyLite.pl [OPTIONS] [FASTA]
-m Minimum repeat length (default: 100)
-n nmer length for homology (default: 20 bp)
-o OUTPUT DIRECTORY (required)
-s Output files prefix (default: default)

ORFs.pl

Used for converting GenBank data over to SVG for use in figures

ORFs.pl [OPTIONS]
-a Mauve alignment file
-e Stem for file name
-g GenBank file (required)
-o Output directory (required)
-s Start location
-t Stop location

Optical.pl

Used for visualising optical mapping

optical.pl [OPTIONS] [MAP1],[MAP2]

WriteORFs.pl

Converts genomic features coordinates into SVF formatting for use in figures

TODO: Document fully


BASH scripts:

run.sh

TODO: Document fully

run_ORFs.sh

TODO: Document fully

run_align.sh

TODO: Document fully

run_blast.sh

TODO: Document fully

run_homology.sh

TODO: Document fully

run_mapping.sh

TODO: Document fully

run_synteny.sh

TODO: Document fully

lineage2.sh

TODO: Document fully

About

Scripts used for calculating chromosomal repeat statistics and for generating EHEC figures

Resources

License

Stars

Watchers

Forks

Packages

No packages published