Skip to content

Erill Lab Logo

As Robert Robbins1 put it, the DNA sequence of an organism is “the result of literally millions of maintenance revisions performed by the worst possible set of kludge–using, spaghetti–coding, opportunistic hackers (i.e. evolution) who delight in clever tricks like writing self–modifying code and relying upon undocumented system quirks”. As a result, deciphering molecular biology is the ultimate dream of a computer scientist.

Research Lines 🔬

Click on the titles to read more.

Evolutionary simulation of transcription factor-binding site interaction principles Modeling transcription factor-binding site interactions is of vital importance to enhancing the quality of regulatory network inference algorithms and to improving our understanding transcriptional regulation. In spite of this, the most frequently used model for transcription factor binding (the position-specific weight/scoring matrix, or PSSM) has remained virtually unchanged for over 30 years. Furthermore, given the difficulty of generating large and accurate datasets for multiple transcription factors, it is not obvious what assumptions of this basic model should be relaxed or how. We propose to use explicit simulations of the co-evolution of a transcription factor with its target sites in a genomic context to test the validity of different assumptions, such as positional independence, by comparing the evolutionary outcomes obtained with relaxed and constrained models.
Enhanced motif discovery tools Conventional motif discovery algorithms rely on the position-specific scoring matrices (PSSM) to model transcription factor-binding motifs. While the binding specificity of well-studied transcription factors can be effectively modeled by assuming positional independence (as in a PSSM), many transcription factors have binding requirements that break this assumption. These include flexible spacer regions between the primary DNA contact dyads and the recognition of structural features of DNA. Research in our lab leverages the power of genetic programming techniques to extract flexible models of co-regulated promoters.
Comparative genomics of transcriptional regulatory networks Comparative genomics is a powerful tool to make inferences on the wiring and evolution of transcriptional regulatory networks, but its application to bacterial regulatory networks is still not well standardized and has been only sparingly used in the analysis of bacterial transcription networks. By leveraging a rapidly-growing amount of experimental data on transcription factor-binding sites, here we seek to standardize comparative genomics analyses of regulatory networks in bacteria and to test their effectiveness for the study of network evolution.
CollecTF In bacteria, data on transcription factor-binding sites is mostly scattered in model organism-centered databases using different standards and methods. We have developed CollecTF as an open database for transcription-factor binding sites across bacteria. [CollecTF](http://www.collectf.org) compiles data on experimentally validated, naturally occurring TF-binding sites across the Bacteria domain, placing a strong emphasis on the transparency of the curation process, the quality and availability of the stored data and fully customizable access to its records. Furthermore, CollecTF entries are periodically submitted to NCBI for integration into RefSeq complete genome records as db_xref link-out features embedded in genome annotations, to the EBI as regulon information for UniProtKB entries, and as GO annotations through the Gene Ontology Annotation program of the EBI.
Metagenomic analysis of regulatory networks Next-generation sequencing technologies have made it possible to analyze comprehensively the metagenome of microbial communities. Metagenomes provide an extraordinary amount of sequence data on the genetic composition of a bacterial population. Conventional approaches to the analysis of metagenomes have relied on mapping predicted genes onto known pathways. We have shown that known regulatory data and in silico search methods can be leveraged to reconstruct meta-regulons, and we are currently optimizing the bioinformatics pipeline to allow comparing regulatory networks across metagenomes.
Phage genomics As part of the [SEA-PHAGES program](https://seaphages.org/), we routinely work with UMBC Phage Hunters to isolate bacteriophages infecting Bacillus and Streptomyces species. Genome analyses of these phages reveal fundamental features about their evolution and their potential use in biocontrol applications.
Self-adjustable codon bias indices Codon usage bias (CUB) is a widespread phenomenon in natural organisms, which depart from a uniform usage of codons (triplets of mRNA letters that designate an amino acid in the genetic code). The genes of many organisms show highly biased codon usages that correlate well with their expression levels. Thus, indices that measure CUB are important tools for prediction of gene expression, optimization of gene sequences or assessment of lateral gene transfer (LGT).

Highlights 📰

First discovery of a virus that attaches to another virus (deCarvalho et al., 2023)

Host-phage cross-regulation (Mascolo et al., 2022)

Antibiotic resistance

Stay tuned 📱

X

Footnotes

  1. Robbins, R. J., 1992. Challenges in the human genome project. IEEE Engineering in Biology and Medicine, (March 1992):25–34.

Pinned Loading

  1. collectf collectf Public

    CollecTF database implementation

    Python 5 3

  2. cgb cgb Public

    comparative genomics of transcriptional regulation in Bacteria

    Python 2 2

  3. FLEMINGO FLEMINGO Public

    A project to generate models of regulated bacterial promoters using genetic programming

    Python 1

  4. LPEG_phages LPEG_phages Public

    Code and data for the analysis of TF-binding sites positional entropy in phage genomes, used in Mascolo et al. (2022)

    Python 1

  5. SPA SPA Public

    Code and data used to perform the tAI analysis in deCarvalho et al. (2023)

    Scheme 1

  6. ViPhy ViPhy Public

    Virus Phylogeny System

    Python 1

Repositories

Showing 10 of 70 repositories
  • Info-Theo-of-Composite-Motifs Public

    Computational simulations in support of our theoretical work on the information theory of composite motifs

    ErillLab/Info-Theo-of-Composite-Motifs’s past year of commit activity
    Python 1 0 0 0 Updated Dec 17, 2024
  • Markov_DNA_gen Public

    A Markov Model DNA sequence generator to generate pseudo-replicate sequences based on an input sequence

    ErillLab/Markov_DNA_gen’s past year of commit activity
    Python 2 GPL-3.0 0 0 0 Updated Sep 24, 2024
  • FLEMINGO Public

    A project to generate models of regulated bacterial promoters using genetic programming

    ErillLab/FLEMINGO’s past year of commit activity
    Python 1 GPL-3.0 0 0 0 Updated Sep 18, 2024
  • QD-motifs Public

    Evolving TF-binding motifs using quality-diversity algorithms

    ErillLab/QD-motifs’s past year of commit activity
    Python 0 GPL-3.0 0 0 0 Updated Jul 1, 2024
  • fast_PSSM_search Public

    Lookahead implementation of biopython's motif search function

    ErillLab/fast_PSSM_search’s past year of commit activity
    0 GPL-3.0 0 0 0 Updated Jun 25, 2024
  • .github Public

    Profile repository of the ErillLab organization

    ErillLab/.github’s past year of commit activity
    0 0 0 0 Updated Jun 21, 2024
  • CAG-MD Public

    Comparative genomics-assisted motif discovery

    ErillLab/CAG-MD’s past year of commit activity
    Python 0 0 0 0 Updated Jun 18, 2024
  • cgb3EM Public

    EM framework for CGB3

    ErillLab/cgb3EM’s past year of commit activity
    0 GPL-3.0 0 0 0 Updated Jun 13, 2024
  • PhageReg_CompGen Public

    Comparative genomics-based discovery of phage regulatory regions

    ErillLab/PhageReg_CompGen’s past year of commit activity
    0 GPL-3.0 0 0 0 Updated May 30, 2024
  • SPA Public

    Code and data used to perform the tAI analysis in deCarvalho et al. (2023)

    ErillLab/SPA’s past year of commit activity
    Scheme 1 GPL-3.0 0 0 0 Updated Apr 28, 2024