Supplemental materials for TAU Thesis, including results, tables and graphs.
- GOrilla results of GO term enrichment for various target sets in Homo Sapiens, Drosophila Melanogaster, C. Elegans and Arabidopsis Thaliana.
- Visualization of clusters of frequent short peptides in human.
- WebGestalt results for disease and pathway enrichment.
- Lists of the most frequent short peptides across 28 species, before and after dilution of similar proteins, and the corresponding proteins.
Eden E, Navon R, Steinfeld I, Lipson D, Yakhini Z. GOrilla: A Tool For Discovery And Visualization of Enriched GO Terms in Ranked Gene Lists. BMC Bioinformatics (2009), 10:48.
Zhang B, Kirov S, Snoddy J. WebGestalt: an integrated system for exploring gene sets in various biological contexts. Nucleic Acids Res (2005), 33, W741-748.
In this work, we study Frequent Short Peptides (FSPs) in proteomes of species from across the Eukarya. Our definition of FSP captures peptides that are the most frequent among different proteins within the same species. Specifically, we are interested in short peptides of 10 amino acids. We show a considerable variance between the identities of FSPs of different species. For most species, the FSPs belong to a small set of homologous protein families, such as zinc fingers and olfactory receptors in humans.
We introduce a procedure for eliminating the over-representation of FSPs of homologous protein families, by using a sequence alignment algorithm to "dilute" similar proteins during the FSP counting process. This dilution procedure reveals a conspicuous presence of single amino-acid repeats (SAARs) and almost-SAARs among FSPs, especially in vertebrates.
An analysis of diluted groups of human proteins that contain FSPs reveals that many of them exhibit a significant Gene Ontology enrichment for terms related to regulation of RNA metabolism, regulation of DNA transcription, and nucleus components. A predominantly high enrichment level is observed for the 10-mers poly-alanine and poly-glutamine, which are among the most frequent peptides in human, and are also known to be correlated with neurodegenerative diseases and cancer.
Further analysis of diluted FSPs demonstrates that vertebrates, especially mammals, share many common frequent peptides, while invertebrates exhibit substantial dissimilarities between them. We use the diluted FSP sets to define a metric for distance between species, which provides a good quality clustering of vertebrates and other Eukaryotes, even when using only diluted FSPs that are not SAARs. Interestingly, a similar metric based on the non-diluted FSP sets does not correlate with phylogenetic proximity. The results hint an evolutionary mechanism through which the set of diluted FSPs was consolidated along the neurological complexity of species.
Gal Ron, [email protected]