Skip to content

Commit

Permalink
Merge branch 'master' of https://github.com/aqlaboratory/hsm
Browse files Browse the repository at this point in the history
Fix CLI bug.
  • Loading branch information
jmcunnin committed Jan 12, 2020
2 parents 579324c + e620028 commit 6aedf19
Show file tree
Hide file tree
Showing 2 changed files with 16 additions and 17 deletions.
21 changes: 10 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,15 @@

<img align="left" src="misc/symbol_name.png" style="width: 25%; height: 25%"/>

This repository implements the hierarchical statistical mechanical (HSM) model described in the paper [Biophysical prediction of protein-peptide interactions and signaling networks using machine learning.](nature.com)
This repository implements the hierarchical statistical mechanical (HSM) model described in the paper [Biophysical prediction of protein-peptide interactions and signaling networks using machine learning.](https://doi.org/10.1038/s41592-019-0687-1)

An **associated website** is available at [proteinpeptide.io](proteinpeptide.io). The website is built to facilitate interactions with results from the model including: (1) specific domain-peptide and protein-protein predictions, (2) the resulting networks, and (3) structures colored using the inferred energy functions from the model. Code for the website is available via the parallel repo: [aqlaboratory/hsm-web](https://github.com/aqlaboratory/hsm-web).
An **associated website** is available at [proteinpeptide.io](https://proteinpeptide.io). The website is built to facilitate interactions with results from the model including: (1) specific domain-peptide and protein-protein predictions, (2) the resulting networks, and (3) structures colored using the inferred energy functions from the model. Code for the website is available via the parallel repo: [aqlaboratory/hsm-web](https://github.com/aqlaboratory/hsm-web).

This file documents how this package might be [used](#usage), the [location of associated data](#data), and [other metadata](#reference).

## Usage

The model was implemented in Python (>= 3.5) primarily using TensorFlow (>= 1.4) ([Software Requirements](#requirements)). To work with this repository, we recommend downloading pre-processed data available at [doi:](figshare.com) into "data/". Alternatively, it is possible to either re-process raw data ([doi:](figshare.com)) or include new data. The folder contains two major directories: `train/` and `predict/`. Each directory is accompanied by a `README.md` file detailing usage.
The model was implemented in Python (>= 3.5) primarily using TensorFlow (>= 1.4) ([Software Requirements](#requirements)). To work with this repository, either download pre-processed data (see below) or include new data. The folder contains two major directories: `train/` and `predict/`. Each directory is accompanied by a `README.md` file detailing usage.

To train / re-train new models, use the `train.py` script in `train/`. To make predictions using a model, use one of two scripts, `predict_domains.py` and `predict_proteins.py`, for predicting either domain-peptide interactions or protein-protein interactions. Scripts are designed with a CLI and should be used from the command line:

Expand All @@ -20,18 +20,17 @@ python [SCRIPT] [OPTIONS]

Options for any script may be listed using the `-h/--help` flag.

Pre-processed / pre-trained data and models may be downloaded from [figshare/doi:](figshare.com) and should be unpacked at `data/` in this directory. This directory may also be used as an example of how to structure input and output files / directories.
Pre-processed / pre-trained data and models may be downloaded from [figshare (doi:10.6084/m9.figshare.11520552)](https://doi.org/10.6084/m9.figshare.11520552) and should be unpacked at `data/` in this directory. This directory may also be used as an example of how to structure input and output files / directories.

An alternative use case would be to train / re-train a new model in the `train/` code and make new predictions using the `predict/` code.

## Data

As reported, domain-peptide and protein-protein interactions are available via [figshare/doi:](figshare.com). In addition, we provide pre-processed data for this repository and the website repository,
As reported, domain-peptide and protein-protein interactions are available via [figshare (doi:10.6084/m9.figshare.10084745)](https://doi.org/10.6084/m9.figshare.10084745). In addition, we provide pre-processed data for this repository and the website repository,

- Raw training data: [figshare/doi:](figshare.com). Raw domain-peptide training data used to train the core HSM models. Unpack to `data/` in this directory.
- Website data: [figshare/doi:](figshare.com). Data supporting the website at [proteinpeptide.io](proteinpeptide.io)

The data used to the train the model is also provided at a separate data repository: [figshare/doi:](figshare.com).
- Raw training data: [figshare - doi:10.6084/m9.figshare.11520297](https://doi.org/10.6084/m9.figshare.11520297). Raw domain-peptide training data used to train the core HSM models. Unpack to `data/` in this directory.
- Pre-processed data: [figshare - doi:10.6084/m9.figshare.11520552](https://doi.org/10.6084/m9.figshare.11520552). Needed to work with this repo.
- Data supporting the website at [proteinpeptide.io](https://proteinpeptide.io)

## Requirements
- Python (>= 3.5)
Expand All @@ -44,9 +43,9 @@ The data used to the train the model is also provided at a separate data reposit
## Reference
Please reference the associated publication:

Cunningham, J.M., Koytiger, G., Sorger, P.K., & AlQuraishi, M. "Biophysical prediction of protein-peptide interactions and signaling networks using machine learning." *Nature Methods* (2020). [doi:](https://doi.org/). ([citation.bib](https://raw.githubusercontent.com/aqlaboratory/hsm/misc/citation.bib))
Cunningham, J.M., Koytiger, G., Sorger, P.K., & AlQuraishi, M. "Biophysical prediction of protein-peptide interactions and signaling networks using machine learning." *Nature Methods* (2020). [doi:10.1038/s41592-019-0687-1](https://doi.org/10.1038/s41592-019-0687-1). ([citation.bib](misc/citation.bib))

See also, a **website** at [proteinpeptide.io](proteinpeptide.io) for exploring the associated analyses (code: [aqlaboratory/hsm-web](https://github.com/aqlaboratory/hsm-web)).
See also, a **website** at [proteinpeptide.io](https://proteinpeptide.io) for exploring the associated analyses (code: [aqlaboratory/hsm-web](https://github.com/aqlaboratory/hsm-web)).

## Funding

Expand Down
12 changes: 6 additions & 6 deletions predict/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Predictions are run through one of two scripts, `predict_domains.py` and `predic

## Domain-Peptide Interaction Predictions

Code used for predicting domain-peptide interactions is located in the predict/ directory in this repository. The functionality should primarily be accessed via the predict\_domains.py script.
Code used for predicting domain-peptide interactions is located in the predict/ directory in this repository. The functionality should primarily be accessed via the `predict_domains.py` script.

```python
python predict_domains.py [INPUT DOMAINS METADATA] [INPUT PEPTIDES METADATA] [OPTIONS]
Expand All @@ -18,7 +18,7 @@ Additional options for using either script may be listed using the `-h/--help` f
The basic steps for predicting a new interaction is:
### 0. Pre-process data and models.

By default, the code assumes that models are located at `predict/models/` and pre-processed data, which can be downloaded (see [Data section](#data)), are available at `data/metadata`. New data must be passed explicitly to the code (see the next section). Output model files should be the same as formatted by `output_models.py` in the `train/` directory.
By default, the code assumes that models are located at `predict/models/` and pre-processed data, which can be downloaded from [figshare (doi:10.6084/m9.figshare.11520552)](https://doi.org/10.6084/m9.figshare.11520552), should be available at `data/predict`. New data must be passed explicitly to the code (see the next section). Output model files should be the same as formatted by `output_models.py` in the `train/` directory.

Input domains files should have the format:
```
Expand Down Expand Up @@ -50,7 +50,7 @@ The domain and peptide alignment lengths refer to the domain / peptide alignment

## Protein-Protein Interaction Predictions

Code used for predicting protein-protein interactions is located in the predict/ directory in this repository. The functionality should primarily be accessed via the predict\_proteins.py script.
Code used for predicting protein-protein interactions is located in the predict/ directory in this repository. The functionality should primarily be accessed via the `predict_proteins.py` script.

```python
python predict_proteins.py [-p [INPUT PPI PAIRS]] [OPTIONS]
Expand All @@ -59,13 +59,13 @@ Additional options for using either script may be listed using the `-h/--help` f

## 0. Pre-process data and models.

By default, the `predict_proteins.py` script also assumes models are located at `predict/models/` and pre-processed data, which can be downloaded (see [Data section](#data)), are available at `data/metadata`. New data must be passed explicitly to the code (see the next section). The same models files may be used in both domain-peptide and protein-protein interaction prediction. To use new models, the same steps to specify the new models must be passed to `predict_proteins.py`. In addition, the models requiire metadata files (by default, stored in `data/metadata`) that describe either the domain or peptide composition of proteins. Metadata are formatted as Python dictionaries (stored as pickle'd files) with the format:
By default, the `predict_proteins.py` script also assumes models are located at `predict/models/` and pre-processed data, which can be downloaded via [figshare (doi:10.6084/m9.figshare.11520552)](https://doi.org/10.6084/m9.figshare.11520552), are available at `data/metadata`. New data must be passed explicitly to the code (see the next section). The same models files may be used in both domain-peptide and protein-protein interaction prediction. To use new models, the same steps to specify the new models must be passed to `predict_proteins.py`. In addition, the models requiire metadata files (by default, stored in `data/metadata`) that describe either the domain or peptide composition of proteins. Metadata are formatted as Python dictionaries (stored as pickle'd files) with the format:

## 1. Run predictions

Predictions can be computed using the described script:

```python
python predict_proteins.py [-p [INPUT PPI PAIRS]] [OPTIONS]
python predict_proteins.py [--ppi_pairs [INPUT PPI PAIRS]] [OPTIONS]
```
The `INPUT PPI PAIRS` option (passed using `-p / --ppi_pairs`) passed to the code denotes a csv file containing the proteins to predict. These pairs should be formatted as a csv file where each line contains a pair of protein IDs (`<ID 1>,<ID 2>`). These IDs should reference IDs in the metadata files. If no pairs are passed, all valid pairs are returned. Different metadata files may be passed in using the `--domain_metadata` and `--peptide_metadata` options.
The `INPUT PPI PAIRS` option (passed using `--ppi_pairs`) passed to the code denotes a csv file containing the proteins to predict. These pairs should be formatted as a csv file where each line contains a pair of protein IDs (`<ID 1>,<ID 2>`). These IDs should reference IDs in the metadata files. If no pairs are passed, all valid pairs are returned. Different metadata files may be passed in using the `--domain_metadata` and `--peptide_metadata` options.

0 comments on commit 6aedf19

Please sign in to comment.