This software solves the Seriation problem finding a suitable linear order for a set of proteins. The result is a list of proteins ordered in one dimension such that functionally associated proteins are closer.
Figure 1. Visual representation of the main output produced by this software. (A) Initial state of an adjacency matrix containing 4386 Saccharomyces cerevisiae proteins, the x-axis is randomly ordered. (B) Final state of the same adjacency matrix using the ordered protein list obtained. The interaction between two proteins is represented by a black dot.
The software was developed by Felipe Kuentzer, in collaboration with Douglas G. Ávila, Alexandre Pereira, Gabriel Perrone, Samoel da Silva, Alexandre Amory, and Rita de Almeida.
The version provided here was modified by Clovis Ferreira dos Reis to improve the textual feedback and to avoid bugs like:
- Duplication of identifiers on the ordering output.
- Segmentation fault while reading an input file containing many nodes.
Compilation requires GCC. To compile this software invoke the following commands on the shell:
> wget https://github.com/arthurvinx/seriation/archive/master.zip > unzip master.zip > cd seriation-master/ > gcc ordering1D.c -o ordering1D -lm
To execute the software invoke this command on the shell:
> ./ordering1D f=[absolute path to association file]
Parameters list:
> ./ordering1D An association file name is necessary! No default! Parameters list: f=Association file i=Number of isothermal steps m=Number of Monte Carlo steps c=Cooling factor a=Alpha value p=Percentual energy for initial temperature s=Random seed
Parameters default values:
i=100 m=2000 c=0.5 a=1.0 p=0.0001
The input is a text file describing an undirected protein-protein interaction (PPI) network. This repository contains an example file from Escherichia coli. In this example, the nodes are labeled by ENSEMBL Peptide IDs.
Protein-protein interaction network data can be downloaded from STRING. You may choose to download the information with the subscores per channel and tune your filters. The input must be a file containing two columns, no header, with rows composed by the IDs of two proteins that interact with each other.
Two text files will be saved in the association file directory, one containing the prefix "energy_" detailing the ordering process, and one containing the prefix "ordering_" (this will be your ordered list). The lower the final energy, the better the ordered list. I suggest to increase the number of Monte Carlo steps to 20000 to improve the outputs.
This repository contains an example of the output produced by this software for the Escherichia coli PPI network.
The source code is distributed under the terms of the GNU General Public License v3 GPL.
If you are using this software on your research please cite:
- Seriation R Package, available at CRAN.