GitHub - dalmolingroup/seriation: C code to find a suitable linear order for a set of proteins

About this software

This software solves the Seriation problem finding a suitable linear order for a set of proteins. The result is a list of proteins ordered in one dimension such that functionally associated proteins are closer.

Figure 1. Visual representation of the main output produced by this software. (A) Initial state of an adjacency matrix containing 4386 Saccharomyces cerevisiae proteins, the x-axis is randomly ordered. (B) Final state of the same adjacency matrix using the ordered protein list obtained. The interaction between two proteins is represented by a black dot.

Authors

The software was developed by Felipe Kuentzer, in collaboration with Douglas G. Ávila, Alexandre Pereira, Gabriel Perrone, Samoel da Silva, Alexandre Amory, and Rita de Almeida.

The version provided here was modified by Clovis Ferreira dos Reis to improve the textual feedback and to avoid bugs like:

Duplication of identifiers on the ordering output.
Segmentation fault while reading an input file containing many nodes.

Download and compilation

Compilation requires GCC. To compile this software invoke the following commands on the shell:

> wget https://github.com/arthurvinx/seriation/archive/master.zip
> unzip master.zip
> cd seriation-master/
> gcc ordering1D.c -o ordering1D -lm

How to use

To execute the software invoke this command on the shell:

> ./ordering1D f=[absolute path to association file]

Parameters list:

> ./ordering1D

An association file name is necessary! No default!

Parameters list:
        f=Association file
        i=Number of isothermal steps
        m=Number of Monte Carlo steps
        c=Cooling factor
        a=Alpha value
        p=Percentual energy for initial temperature
        s=Random seed

Parameters default values:

i=100
m=2000
c=0.5
a=1.0
p=0.0001

Input

The input is a text file describing an undirected protein-protein interaction (PPI) network. This repository contains an example file from Escherichia coli. In this example, the nodes are labeled by ENSEMBL Peptide IDs.

Protein-protein interaction network data can be downloaded from STRING. You may choose to download the information with the subscores per channel and tune your filters. The input must be a file containing two columns, no header, with rows composed by the IDs of two proteins that interact with each other.

Outputs

Two text files will be saved in the association file directory, one containing the prefix "energy_" detailing the ordering process, and one containing the prefix "ordering_" (this will be your ordered list). The lower the final energy, the better the ordered list. I suggest to increase the number of Monte Carlo steps to 20000 to improve the outputs.

This repository contains an example of the output produced by this software for the Escherichia coli PPI network.

License

The source code is distributed under the terms of the GNU General Public License v3 GPL.

How to cite this software

If you are using this software on your research please cite:

Kuentzer, F. A. et al. (2014). Optimization and analysis of seriation algorithm for ordering protein networks. IEEE International Conference on Bioinformatics and Bioengineering, 231-237.

Similar softwares

Seriation R Package, available at CRAN.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
figure		figure
output		output
README.md		README.md
ordering1D.c		ordering1D.c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About this software

Authors

Download and compilation

How to use

Input

Outputs

License

How to cite this software

Similar softwares

About

Languages

dalmolingroup/seriation

Folders and files

Latest commit

History

Repository files navigation

About this software

Authors

Download and compilation

How to use

Input

Outputs

License

How to cite this software

Similar softwares

About

Topics

Resources

Stars

Watchers

Forks

Languages