Merge branch 'master' of https://github.com/aqlaboratory/hsm

Fix CLI bug.
aqlaboratory · Jan 12, 2020 · 6aedf19 · 6aedf19
2 parents 579324c + e620028
commit 6aedf19
Show file tree

Hide file tree

Showing 2 changed files with 16 additions and 17 deletions.
diff --git a/README.md b/README.md
@@ -2,15 +2,15 @@
 
 <img align="left" src="misc/symbol_name.png" style="width: 25%; height: 25%"/> 
 
-This repository implements the hierarchical statistical mechanical (HSM) model described in the paper [Biophysical prediction of protein-peptide interactions and signaling networks using machine learning.](nature.com) 
+This repository implements the hierarchical statistical mechanical (HSM) model described in the paper [Biophysical prediction of protein-peptide interactions and signaling networks using machine learning.](https://doi.org/10.1038/s41592-019-0687-1) 
 
-An **associated website** is available at [proteinpeptide.io](proteinpeptide.io). The website is built to facilitate interactions with results from the model including: (1) specific domain-peptide and protein-protein predictions, (2) the resulting networks, and (3) structures colored using the inferred energy functions from the model. Code for the website is available via the parallel repo: [aqlaboratory/hsm-web](https://github.com/aqlaboratory/hsm-web).
+An **associated website** is available at [proteinpeptide.io](https://proteinpeptide.io). The website is built to facilitate interactions with results from the model including: (1) specific domain-peptide and protein-protein predictions, (2) the resulting networks, and (3) structures colored using the inferred energy functions from the model. Code for the website is available via the parallel repo: [aqlaboratory/hsm-web](https://github.com/aqlaboratory/hsm-web).
 
 This file documents how this package might be [used](#usage), the [location of associated data](#data), and [other metadata](#reference). 
 
 ## Usage
 
-The model was implemented in Python (>= 3.5) primarily using TensorFlow (>= 1.4) ([Software Requirements](#requirements)). To work with this repository, we recommend downloading pre-processed data available at [doi:](figshare.com) into "data/". Alternatively, it is possible to either re-process raw data ([doi:](figshare.com)) or include new data. The folder contains two major directories: `train/` and `predict/`. Each directory is accompanied by a `README.md` file detailing usage. 
+The model was implemented in Python (>= 3.5) primarily using TensorFlow (>= 1.4) ([Software Requirements](#requirements)). To work with this repository, either download pre-processed data (see below) or include new data. The folder contains two major directories: `train/` and `predict/`. Each directory is accompanied by a `README.md` file detailing usage. 
 
 To train / re-train new models, use the `train.py` script in `train/`. To make predictions using a model, use one of two scripts, `predict_domains.py` and `predict_proteins.py`, for predicting either domain-peptide interactions or protein-protein interactions. Scripts are designed with a CLI and should be used from the command line: 
 
@@ -20,18 +20,17 @@ python [SCRIPT] [OPTIONS]
 
 Options for any script may be listed using the `-h/--help` flag. 
 
-Pre-processed / pre-trained data and models may be downloaded from [figshare/doi:](figshare.com) and should be unpacked at `data/` in this directory. This directory may also be used as an example of how to structure input and output files / directories.
+Pre-processed / pre-trained data and models may be downloaded from [figshare (doi:10.6084/m9.figshare.11520552)](https://doi.org/10.6084/m9.figshare.11520552) and should be unpacked at `data/` in this directory. This directory may also be used as an example of how to structure input and output files / directories.
 
 An alternative use case would be to train / re-train a new model in the `train/` code and make new predictions using the `predict/` code. 
 
 ## Data
 
-As reported, domain-peptide and protein-protein interactions are available via [figshare/doi:](figshare.com). In addition, we provide pre-processed data for this repository and the website repository, 
+As reported, domain-peptide and protein-protein interactions are available via [figshare (doi:10.6084/m9.figshare.10084745)](https://doi.org/10.6084/m9.figshare.10084745). In addition, we provide pre-processed data for this repository and the website repository, 
 
-- Raw training data: [figshare/doi:](figshare.com). Raw domain-peptide training data used to train the core HSM models. Unpack to `data/` in this directory.
-- Website data: [figshare/doi:](figshare.com). Data supporting the website at [proteinpeptide.io](proteinpeptide.io)
-
-The data used to the train the model is also provided at a separate data repository: [figshare/doi:](figshare.com). 
+- Raw training data: [figshare - doi:10.6084/m9.figshare.11520297](https://doi.org/10.6084/m9.figshare.11520297). Raw domain-peptide training data used to train the core HSM models. Unpack to `data/` in this directory.
+- Pre-processed data: [figshare - doi:10.6084/m9.figshare.11520552](https://doi.org/10.6084/m9.figshare.11520552). Needed to work with this repo. 
+- Data supporting the website at [proteinpeptide.io](https://proteinpeptide.io)
 
 ## Requirements
 - Python (>= 3.5)
@@ -44,9 +43,9 @@ The data used to the train the model is also provided at a separate data reposit
 ## Reference
 Please reference the associated publication:
 
-Cunningham, J.M., Koytiger, G., Sorger, P.K., & AlQuraishi, M. "Biophysical prediction of protein-peptide interactions and signaling networks using machine learning." *Nature Methods* (2020). [doi:](https://doi.org/). ([citation.bib](https://raw.githubusercontent.com/aqlaboratory/hsm/misc/citation.bib))
+Cunningham, J.M., Koytiger, G., Sorger, P.K., & AlQuraishi, M. "Biophysical prediction of protein-peptide interactions and signaling networks using machine learning." *Nature Methods* (2020). [doi:10.1038/s41592-019-0687-1](https://doi.org/10.1038/s41592-019-0687-1). ([citation.bib](misc/citation.bib))
 
-See also, a **website** at [proteinpeptide.io](proteinpeptide.io) for exploring the associated analyses (code: [aqlaboratory/hsm-web](https://github.com/aqlaboratory/hsm-web)). 
+See also, a **website** at [proteinpeptide.io](https://proteinpeptide.io) for exploring the associated analyses (code: [aqlaboratory/hsm-web](https://github.com/aqlaboratory/hsm-web)). 
 
 ## Funding
 

diff --git a/predict/README.md b/predict/README.md
@@ -8,7 +8,7 @@ Predictions are run through one of two scripts, `predict_domains.py` and `predic
 
 ## Domain-Peptide Interaction Predictions
 
-Code used for predicting domain-peptide interactions is located in the predict/ directory in this repository. The functionality should primarily be accessed via the predict\_domains.py script.
+Code used for predicting domain-peptide interactions is located in the predict/ directory in this repository. The functionality should primarily be accessed via the `predict_domains.py` script.
 
 ```python
 python predict_domains.py [INPUT DOMAINS METADATA] [INPUT PEPTIDES METADATA] [OPTIONS] 
@@ -18,7 +18,7 @@ Additional options for using either script may be listed using the `-h/--help` f
 The basic steps for predicting a new interaction is:
 ### 0. Pre-process data and models.
 
-By default, the code assumes that models are located at `predict/models/` and pre-processed data, which can be downloaded (see [Data section](#data)), are available at `data/metadata`. New data must be passed explicitly to the code (see the next section). Output model files should be the same as formatted by `output_models.py` in the `train/` directory. 
+By default, the code assumes that models are located at `predict/models/` and pre-processed data, which can be downloaded from [figshare (doi:10.6084/m9.figshare.11520552)](https://doi.org/10.6084/m9.figshare.11520552), should be available at `data/predict`. New data must be passed explicitly to the code (see the next section). Output model files should be the same as formatted by `output_models.py` in the `train/` directory. 
 
 Input domains files should have the format:
 ```
@@ -50,7 +50,7 @@ The domain and peptide alignment lengths refer to the domain / peptide alignment
 
 ## Protein-Protein Interaction Predictions
 
-Code used for predicting protein-protein interactions is located in the predict/ directory in this repository. The functionality should primarily be accessed via the predict\_proteins.py script.
+Code used for predicting protein-protein interactions is located in the predict/ directory in this repository. The functionality should primarily be accessed via the `predict_proteins.py` script.
 
 ```python
 python predict_proteins.py [-p [INPUT PPI PAIRS]] [OPTIONS] 
@@ -59,13 +59,13 @@ Additional options for using either script may be listed using the `-h/--help` f
 
 ## 0. Pre-process data and models.
 
-By default, the `predict_proteins.py` script also assumes models are located at `predict/models/` and pre-processed data, which can be downloaded (see [Data section](#data)), are available at `data/metadata`. New data must be passed explicitly to the code (see the next section). The same models files may be used in both domain-peptide and protein-protein interaction prediction. To use new models, the same steps to specify the new models must be passed to `predict_proteins.py`. In addition, the models requiire metadata files (by default, stored in `data/metadata`) that describe either the domain or peptide composition of proteins. Metadata are formatted as Python dictionaries (stored as pickle'd files) with the format: 
+By default, the `predict_proteins.py` script also assumes models are located at `predict/models/` and pre-processed data, which can be downloaded via [figshare (doi:10.6084/m9.figshare.11520552)](https://doi.org/10.6084/m9.figshare.11520552), are available at `data/metadata`. New data must be passed explicitly to the code (see the next section). The same models files may be used in both domain-peptide and protein-protein interaction prediction. To use new models, the same steps to specify the new models must be passed to `predict_proteins.py`. In addition, the models requiire metadata files (by default, stored in `data/metadata`) that describe either the domain or peptide composition of proteins. Metadata are formatted as Python dictionaries (stored as pickle'd files) with the format: 
 
 ## 1. Run predictions
 
 Predictions can be computed using the described script:
 
 ```python
-python predict_proteins.py [-p [INPUT PPI PAIRS]] [OPTIONS] 
+python predict_proteins.py [--ppi_pairs [INPUT PPI PAIRS]] [OPTIONS] 
 ```
-The `INPUT PPI PAIRS` option (passed using `-p / --ppi_pairs`) passed to the code denotes a csv file containing the proteins to predict. These pairs should be formatted as a csv file where each line contains a pair of protein IDs (`<ID 1>,<ID 2>`). These IDs should reference IDs in the metadata files. If no pairs are passed, all valid pairs are returned. Different metadata files may be passed in using the `--domain_metadata` and `--peptide_metadata` options.  
+The `INPUT PPI PAIRS` option (passed using `--ppi_pairs`) passed to the code denotes a csv file containing the proteins to predict. These pairs should be formatted as a csv file where each line contains a pair of protein IDs (`<ID 1>,<ID 2>`). These IDs should reference IDs in the metadata files. If no pairs are passed, all valid pairs are returned. Different metadata files may be passed in using the `--domain_metadata` and `--peptide_metadata` options.