-
Notifications
You must be signed in to change notification settings - Fork 1
3.5. Data Preparation
This page enumerates the recipes for preparing the data (i.e., populating the static
folder).
- Storage: ~30.2 GB (~6.2 GB for the Docker image and ~24 GB for the accompanying dataset)
- Memory: ≥ 8 GB
- Operating System: Linux, macOS, or Windows
-
Start by downloading the latest version of the dataset needed for the data preparation workflow from here.
This dataset is different from the one on the Installation page. This dataset includes all the data (i.e., even the raw dataset files), whereas the one on the Installation page includes only those necessary for the app to run.
💡 If you want to verify the integrity of the downloaded dataset, compute the SHA-512 checksum of the
tgz
archive file using a hashing utility likecertutil
in Windows,shasum
in Mac, orsha512sum
in Linux. You should obtain the following checksum:4242a9eb61338a48a6a8176c5d8add08f8febcecd1e31ce5a0ea4aff562bed83961009621d96205984ec0fe9fb2a699af812986e337ea62877eb5dfff6591d79
-
Extract the contents of the data folder:
-
The extraction process should result in a folder named
static
. Inside this should be two folders namedapp_data
andraw_data
.
-
Download and install Docker, a platform for building and running containerized apps:
- [For macOS and Windows] Install Docker Desktop.
- [For Linux] For easier installation, we recommend installing Docker Engine instead of Docker Desktop. Instructions for different Linux distributions can be found here.
-
Start the Docker daemon:
- [For macOS and Windows] Open Docker Desktop to start the daemon.
- [For Linux] Follow the instructions here.
-
Launch a terminal (from anywhere), and pull the latest Docker image for the workflow by running:
docker pull ghcr.io/bioinfodlsu/rice-pilaf/workflow:latest
-
Spin up a container from the image by running:
docker create --name rice-pilaf-workflow -v path/to/static/in/local:/app/static ghcr.io/bioinfodlsu/rice-pilaf/workflow:latest
Replace
path/to/static/in/local
with the path to thestatic
folder generated following the steps in the previous section. It may be more convenient to use the absolute path. If you are using Windows, replace the backward slashes (\
) in the path with forward slashes (/
). -
Launch a terminal (from anywhere), and start the RicePilaf workflow container by running:
docker start rice-pilaf-workflow
-
Open a shell that will execute commands in the container by running:
docker exec -it rice-pilaf-workflow bash
Doing so should change the working directory to
root@<container_id>:/app/prepare_data/workflow/scripts
.⚠️ IMPORTANT: All the commands in the data preparation recipes listed on this page should be run on this shell (i.e., they should be executed in the container). -
Once you are done using the RicePilaf workflow container, stop the container by running:
docker stop rice-pilaf-workflow
-
If you want to use the RicePilaf workflow container again, follow Steps 5 and 6.
-
You can use the
-h
or--help
flag to display more information about a data processing script (e.g., its arguments, output files, and their descriptions), like so:For Python scripts (replace
<FILENAME>
with the filename of the script):python3 <FILENAME> --help
For R scripts (replace
<FILENAME>
with the filename of the script):Rscript --vanilla <FILENAME> --help
-
Several output files are pickled files. You can use the Visual Studio Code extension
vscode-pydata-viewer
to display their contents without needing to write a Python script.
Click here to show/hide the recipes
python3 ogi_mapping/generate-ogi-dicts.py ../../../static/raw_data/gene_ID_mapping_fromRGI ../../../static/app_data/gene_id_mapping/ogi_mapping
python3 ogi_mapping/generate-nb-dicts.py ../../../static/app_data/gene_id_mapping/ogi_mapping ../../../static/app_data/gene_id_mapping/nb_mapping
python3 gene_id_mapping/msu-to-rapdb-id.py ../../../static/raw_data/enrichment_analysis/rap_db/RAP-MSU_2023-03-15.txt ../../../static/app_data/gene_id_mapping/msu_mapping
Click here to show/hide the recipes
python3 ogi_mapping/generate-nb-to-iric-dicts.py ../../../static/raw_data/gene_ID_mapping_fromRGI ../../../static/app_data/gene_id_mapping/iric_mapping
python3 iric_description/map-gene-to-interpro.py ../../../static/raw_data/iric_data/iric_data_original.pkl ../../../static/raw_data/iric_data/interpro2name.txt ../../../static/app_data/iric_data
python3 iric_description/map-gene-to-pfam.py ../../../static/raw_data/iric_data/iric_data_original.pkl ../../../static/raw_data/iric_data/pfam2name.json ../../../static/app_data/iric_data
python3 qtaro/prepare-qtaro.py .imports --remove-./../../static/raw_data/qtaro/Qtaro_Mar2016_convMSU_1849.csv ../../../static/app_data/qtaro
python3 gene_description/prepare_desc_uniprot_dict.py ../../../static/app_data/gene_descriptions/Nb/Nb_gene_descriptions.csv ../../../static/app_data/gene_descriptions/Nb
Click here to show/hide the recipes
python3 text_mining/get-pubmed-per-gene.py ../../../static/raw_data/text_mining/gene_index_table.csv ../../../static/app_data/text_mining/annotated_abstracts.tsv ../../../static/raw_data/text_mining/match_filtering/symbol_replacement.tsv ../../../static/raw_data/text_mining/match_filtering/symbol_exclusion.tsv ../../../static/raw_data/text_mining/pubmed_per_gene
python3 text_mining/consolidate-pubmed-dictionaries.py ../../../static/raw_data/text_mining/pubmed_per_gene ../../../static/app_data/text_mining
Note that text_mining/get-pubmed-per-gene.py
may take several days to run. Hence, we provide the option to start and end the script's execution at user-specified genes (<START_GENE>
and <END_GENE>
, respectively), as in the recipe below:
python3 text_mining/get-pubmed-per-gene.py ../../../static/raw_data/text_mining/gene_index_table.csv ../../../static/app_data/text_mining/annotated_abstracts.tsv ../../../static/raw_data/text_mining/match_filtering/symbol_replacement.tsv ../../../static/raw_data/text_mining/match_filtering/symbol_exclusion.tsv ../../../static/raw_data/text_mining/pubmed_per_gene --continue_from <START_GENE> --end_at <END_GENE>
python3 text_mining/consolidate-pubmed-dictionaries.py ../../../static/raw_data/text_mining/pubmed_per_gene ../../../static/app_data/text_mining
python3 text_mining/generate-symbol-to-msu.py ../../../static/raw_data/text_mining/gene_index_table.csv ../../../static/app_data/gene_id_mapping/msu_mapping
Note that:
-
<NETWORK>
can either beOS-CX
(RiceNet v2) orRCRN
(Rice Combined Mutual Ranked Network) -
<ALGO>
can befox
,demon
,coach
, orclusterone
. -
<PARAM>
is the name of the directory containing the module list after running the algorithm with the specified parameter (i.e., after running the module detection recipes here).- For example, if
<ALGO>
isclusterone
and the parameter (minimum density) is 0.3, then<PARAM>
is30
.
- For example, if
-
<MODULE_NUM>
refers to the number of the module on which the enrichment analysis will be performed.
Click here to show/hide the recipes
This recipe converts the co-expression network to the respective formats required to run the module detection algorithms and generates the required mapping dictionaries to convert across the different network representation formats:
python3 network_util/convert-to-int-edge-list.py ../../../static/app_data/networks/<NETWORK>.txt ../../../static/raw_data/network_modules/<NETWORK>/mapping
python3 module_util/generate-mapping-from-networkx-int-edge-graph.py ../../../static/raw_data/network_modules/<NETWORK>/mapping/int-edge-list.txt ../../../static/raw_data/network_modules/<NETWORK>/mapping/int-edge-list-node-mapping.pickle ../../../static/raw_data/network_modules/<NETWORK>/mapping
mkdir -p ../../../static/raw_data/network_modules/<NETWORK>/temp/fox
mkdir -p ../../../static/raw_data/network_modules/<NETWORK>/temp/clusterone
Publication: Nature Methods
Dependency: ClusterONE (Java)
java -jar module_detection/cluster_one-1.0.jar --output-format csv --min-density <MIN_DENSITY> ../../../static/app_data/networks/<NETWORK>.txt > ../../../static/raw_data/network_modules/<NETWORK>/temp/clusterone/clusterone-results-<MIN_DENSITY * 100>.csv
python3 module_util/get-modules-from-clusterone-results.py ../../../static/raw_data/network_modules/<NETWORK>/temp/clusterone/clusterone-results-<MIN_DENSITY * 100>.csv ../../../static/app_data/network_modules/<NETWORK>/MSU/clusterone/<MIN_DENSITY * 100>
Replace <MIN_DENSITY>
with the minimum density:
- If
<MIN_DENSITY>
is 0.3, then<MIN_DENSITY * 100>
is 30. This is just a convention in the app to avoid having decimal points in the directory and file names.
Publication: BMC Bioinformatics
Dependency: CDlib (Python)
python3 module_detection/detect-modules-via-coach.py --affinity_threshold <AFFINITY_THRESHOLD> ../../../static/raw_data/network_modules/<NETWORK>/mapping/int-edge-list.txt ../../../static/raw_data/network_modules/<NETWORK>/temp/coach
python3 module_util/restore-node-labels-in-modules.py ../../../static/raw_data/network_modules/<NETWORK>/temp/coach/coach-int-module-list-<AFFINITY_THRESHOLD * 1000>.csv ../../../static/raw_data/network_modules/<NETWORK>/mapping/networkx-node-mapping.pickle ../../../static/app_data/network_modules/<NETWORK>/MSU/coach/<AFFINITY_THRESHOLD * 1000> coach
Replace <AFFINITY_THRESHOLD>
with the affinity threshold:
- If
<AFFINITY_THRESHOLD>
is 0.125, then<AFFINITY_THRESHOLD * 1000>
is 125. This is just a convention in the app to avoid having decimal points in the directory and file names.
Publication: KDD '12: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Dependency: CDlib (Python)
python3 module_detection/detect-modules-via-demon.py --epsilon <EPSILON> ../../../static/raw_data/network_modules/<NETWORK>/mapping/int-edge-list.txt ../../../static/raw_data/network_modules/<NETWORK>/temp/demon
python3 module_util/restore-node-labels-in-modules.py ../../../static/raw_data/network_modules/<NETWORK>/temp/demon/demon-int-module-list-<EPSILON * 100>.csv ../../../static/raw_data/network_modules/<NETWORK>/mapping/networkx-node-mapping.pickle ../../../static/app_data/network_modules/<NETWORK>/MSU/demon/<EPSILON * 100> demon
Replace <EPSILON>
with the merging threshold (epsilon):
- If
<EPSILON>
is 0.25, then<EPSILON * 100>
is 25. This is just a convention in the app to avoid having decimal points in the directory and file names.
Publication: ACM Transactions on Social Computing (FOX), PeerJ Computer Science (LazyFox, parallelized implementation of FOX)
Dependency: LazyFox (C++)
module_detection/LazyFox --input-graph ../../../static/raw_data/network_modules/<NETWORK>/mapping/int-edge-list.txt --output-dir temp --queue-size 20 --thread-count 20 --disable-dumping --wcc-threshold <WCC_THRESHOLD>
mv temp/CPP*/iterations/*.txt ../../../static/raw_data/network_modules/<NETWORK>/temp/fox/fox-int-module-list-<WCC_THRESHOLD * 100>.txt
rm -r temp
python3 module_util/restore-node-labels-in-modules.py ../../../static/raw_data/network_modules/<NETWORK>/temp/fox/fox-int-module-list-<WCC_THRESHOLD * 100>.txt ../../../static/raw_data/network_modules/<NETWORK>/mapping/int-edge-list-node-mapping.pickle ../../../static/app_data/network_modules/<NETWORK>/MSU/fox/<WCC_THRESHOLD * 100> fox
Replace <WCC_THRESHOLD>
with the weighted community clustering (WCC) threshold:
- If
<WCC_THRESHOLD>
is 0.01, then<WCC_THRESHOLD * 100>
is 1. This is just a convention in the app to avoid having decimal points in the directory and file names.
Click here to show/hide the recipes
Dependency: riceidconverter
(R)
This recipe extracts the nodes (genes) from the co-expression network:
python3 network_util/get-nodes-from-network.py ../../../static/app_data/networks/<NETWORK>.txt ../../../static/raw_data/enrichment_analysis/all_genes/<NETWORK>/MSU
This recipe maps the MSU accessions used in the app to the target IDs required by the pathway enrichment analysis tools:
Rscript --vanilla enrichment_analysis/util/ricegeneid-msu-to-transcript-id.r -g ../../../static/raw_data/enrichment_analysis/all_genes/<NETWORK>/MSU/all-genes.txt -o ../../../static/raw_data/enrichment_analysis/temp/<NETWORK>
python3 enrichment_analysis/util/msu-to-transcript-id.py ../../../static/raw_data/enrichment_analysis/temp/<NETWORK>/all-transcript-id.txt ../../../static/raw_data/enrichment_analysis/temp/<NETWORK>/all-na-transcript-id.txt ../../../static/raw_data/enrichment_analysis/rap_db/RAP-MSU_2023-03-15.txt ../../../static/raw_data/enrichment_analysis/rap_db/IRGSP-1.0_representative_annotation_2023-03-15.tsv ../../../static/raw_data/enrichment_analysis/mapping/<NETWORK>
python3 enrichment_analysis/util/transcript-to-msu-id.py ../../../static/raw_data/enrichment_analysis/mapping/<NETWORK>/msu-to-transcript-id.pickle ../../../static/app_data/gene_id_mapping/msu_mapping/<NETWORK>
python3 enrichment_analysis/util/file-convert-msu.py ../../../static/raw_data/enrichment_analysis/all_genes/<NETWORK>/MSU/all-genes.txt ../../../static/raw_data/enrichment_analysis/mapping/<NETWORK>/msu-to-transcript-id.pickle ../../../static/raw_data/enrichment_analysis/all_genes/<NETWORK> transcript --skip_no_matches
python3 enrichment_analysis/util/file-convert-msu.py ../../../static/app_data/network_modules/<NETWORK>/MSU/<ALGO>/<PARAM>/<ALGO>-module-list.tsv ../../../static/raw_data/enrichment_analysis/mapping/<NETWORK>/msu-to-transcript-id.pickle ../../../static/app_data/enrichment_analysis/<NETWORK>/modules/<ALGO>/<PARAM> transcript
This recipe prepares the data needed for ontology enrichment analysis:
python3 enrichment_analysis/util/aggregate-go-annotations.py ../../../static/raw_data/enrichment_analysis/go/agrigo.tsv ../../../static/raw_data/enrichment_analysis/go/OryzabaseGeneListAll_20230322010000.txt ../../../static/raw_data/enrichment_analysis/rap_db/IRGSP-1.0_representative_annotation_2023-03-15.tsv ../../../static/raw_data/enrichment_analysis/all_genes/<NETWORK>/transcript/all-genes.tsv ../../../static/raw_data/enrichment_analysis/mapping/<NETWORK>/msu-to-transcript-id.pickle ../../../static/raw_data/enrichment_analysis/go/<NETWORK>
python3 enrichment_analysis/util/aggregate-to-annotations.py ../../../static/raw_data/enrichment_analysis/go/OryzabaseGeneListAll_20230322010000.txt ../../../static/raw_data/enrichment_analysis/to/<NETWORK>
python3 enrichment_analysis/util/aggregate-po-annotations.py ../../../static/raw_data/enrichment_analysis/go/OryzabaseGeneListAll_20230322010000.txt ../../../static/raw_data/enrichment_analysis/po/<NETWORK>
Dependencies: GO.db
(R), clusterProfiler
(R)
Rscript --vanilla enrichment_analysis/ontology_enrichment/go-enrichment.r -g ../../../static/app_data/network_modules/<NETWORK>/MSU/<ALGO>/<PARAM>/<ALGO>-module-list.tsv -i <MODULE_NUM> -b ../../../static/raw_data/enrichment_analysis/all_genes/<NETWORK>/MSU/all-genes.txt -m ../../../static/raw_data/enrichment_analysis/go/<NETWORK>/go-annotations.tsv -o ../../../static/app_data/enrichment_analysis/<NETWORK>/output/<ALGO>/<PARAM>/ontology_enrichment/go
Dependency: clusterProfiler
(R)
Rscript --vanilla enrichment_analysis/ontology_enrichment/to-enrichment.r -g ../../../static/app_data/network_modules/<NETWORK>/MSU/<ALGO>/<PARAM>/<ALGO>-module-list.tsv -i <MODULE_NUM> -b ../../../static/raw_data/enrichment_analysis/all_genes/<NETWORK>/MSU/all-genes.txt -m ../../../static/raw_data/enrichment_analysis/to/<NETWORK>/to-annotations.tsv -t ../../../static/raw_data/enrichment_analysis/to/<NETWORK>/to-id-to-name.tsv -o ../../../static/app_data/enrichment_analysis/<NETWORK>/output/<ALGO>/<PARAM>/ontology_enrichment/to
Dependency: clusterProfiler
(R)
Rscript --vanilla enrichment_analysis/ontology_enrichment/po-enrichment.r -g ../../../static/app_data/network_modules/<NETWORK>/MSU/<ALGO>/<PARAM>/<ALGO>-module-list.tsv -i <MODULE_NUM> -b ../../../static/raw_data/enrichment_analysis/all_genes/<NETWORK>/MSU/all-genes.txt -m ../../../static/raw_data/enrichment_analysis/po/<NETWORK>/po-annotations.tsv -t ../../../static/raw_data/enrichment_analysis/po/<NETWORK>/po-id-to-name.tsv -o ../../../static/app_data/enrichment_analysis/<NETWORK>/output/<ALGO>/<PARAM>/ontology_enrichment/po
Dependency: clusterProfiler
(R)
Rscript --vanilla enrichment_analysis/pathway_enrichment/ora-enrichment.r -g ../../../static/app_data/network_modules/<NETWORK>/transcript/<ALGO>/<PARAM>/transcript/<ALGO>-module-list.tsv -i <MODULE_NUM> -b ../../../static/raw_data/enrichment_analysis/all_genes/<NETWORK>/transcript/all-genes.tsv -o ../../../static/app_data/enrichment_analysis/<NETWORK>/output/<ALGO>/<PARAM>/pathway_enrichment/ora
Paper: Genome Research
Dependency: ROntoTools
(R)
Rscript --vanilla enrichment_analysis/pathway_enrichment/pe-enrichment.r -g ../../../static/app_data/enrichment_analysis/modules/<ALGO>/<PARAM>/transcript/<ALGO>-module-list.tsv -i <MODULE_NUM> -b ../../../static/raw_data/enrichment_analysis/all_genes/transcript/all-genes.tsv -o ../../../static/app_data/enrichment_analysis/output/<ALGO>/<PARAM>/pathway_enrichment/pe
This recipe generates additional files needed for the user-facing display of the results on the app (e.g., list of genes in the dosa
pathways and names of the pathways):
Rscript enrichment_analysis/util/get-genes-in-pathway.r -o ../../../static/raw_data/enrichment_analysis/kegg_dosa/geneset
python3 enrichment_analysis/util/get-genes-in-pathway-dict.py ../../../static/raw_data/enrichment_analysis/kegg_dosa/geneset/kegg-dosa-geneset.tsv ../../../static/app_data/enrichment_analysis/mapping
wget -O ../../../static/app_data/enrichment_analysis/mapping/kegg-dosa-pathway-names.tsv https://rest.kegg.jp/list/pathway/dosa
Paper: Bioinformatics
Dependency: SPIA
(R)
The recipe below uses the dosaSPIA.RData
file generated by SPIA from the KGML (KEGG pathway data) files for dosa
or Oryza sativa japonica (Japanese rice) (gene model taken from RAPDB). The KGML files were downloaded on May 11, 2023.
Rscript --vanilla enrichment_analysis/pathway_enrichment/spia-enrichment.r -g ../../../static/app_data/network_modules/<NETWORK>/transcript/<ALGO>/<PARAM>/transcript/<ALGO>-module-list.tsv -i <MODULE_NUM> -b ../../../static/raw_data/enrichment_analysis/all_genes/<NETWORK>/transcript/all-genes.tsv -s ../../../static/raw_data/enrichment_analysis/kegg_dosa/SPIA -o ../../../static/app_data/enrichment_analysis/<NETWORK>/output/<ALGO>/<PARAM>/pathway_enrichment/spia
If you would like to generate dosaSPIA.RData
yourself, the recipe is given below. Note, however, that you have to supply the KGML files for dosa
(save them in ../../../static/raw_data/enrichment_analysis/kegg_dosa/XML
). We do not distribute them in compliance with KEGG's licensing restrictions.
Rscript --vanilla enrichment_analysis/pathway_enrichment/spia-enrichment.r -g ../../../static/app_data/network_modules/<NETWORK>/transcript/<ALGO>/<PARAM>/transcript/<ALGO>-module-list.tsv -i <MODULE_NUM> -b ../../../static/raw_data/enrichment_analysis/all_genes/<NETWORK>/transcript/all-genes.tsv -p ../../../static/raw_data/enrichment_analysis/kegg_dosa/XML -s ../../../static/raw_data/enrichment_analysis/kegg_dosa/SPIA -o ../../../static/app_data/enrichment_analysis/<NETWORK>/output/<ALGO>/<PARAM>/pathway_enrichment/spia
Click here to show/hide the recipes
python3 network_util/map-genes-to-modules.py ../../../static/app_data/network_modules/<NETWORK>/MSU/<ALGO>/<PARAM>/<ALGO>-module-list.tsv ../../../static/app_data/network_modules/<NETWORK>/MSU_to_modules/<ALGO>/<PARAM>
python3 enrichment_analysis/util/map-genes-to-ontology.py ../../../static/app_data/enrichment_analysis/genes_to_ontology_pathway go ../../../static/raw_data/enrichment_analysis/go/OS-CX/go-annotations.tsv ../../../static/raw_data/enrichment_analysis/go/RCRN/go-annotations.tsv
python3 enrichment_analysis/util/map-genes-to-ontology.py ../../../static/app_data/enrichment_analysis/genes_to_ontology_pathway to ../../../static/raw_data/enrichment_analysis/to/OS-CX/to-annotations.tsv ../../../static/raw_data/enrichment_analysis/to/RCRN/to-annotations.tsv
python3 enrichment_analysis/util/map-genes-to-ontology.py ../../../static/app_data/enrichment_analysis/genes_to_ontology_pathway po ../../../static/raw_data/enrichment_analysis/po/OS-CX/po-annotations.tsv ../../../static/raw_data/enrichment_analysis/po/RCRN/po-annotations.tsv
Note that the last argument of enrichment_analysis/util/map-genes-to-ontology.py
is variadic, i.e., you can add as many annotation files as needed.
python enrichment_analysis/util/map-genes-to-pathway.py ../../../static/app_data/enrichment_data/enrichment_analysis/mapping/kegg-dosa-geneset.pickle ../../../static/app_data/enrichment_analysis/genes_to_ontology_pathway ../../../static/app_datg/msu_mapping/OSa/gene_id_mapping/msu_mapping/OS-CX/transcript-to-msu-id.pickle ../../../static/app_data/gene_id_mapping/msu_mapping/RCRN/transcript-to-msu-id.pickle
Note that the last argument of enrichment_analysis/util/map-genes-to-pathway.py
is variadic, i.e., you can add as many transcript ID-to-MSU accession pickled dictionaries as needed.
Click here to show/hide the recipes
python3 tfbs/get_fam.py ../../../static/raw_data/tf_enrichment/tf_list/Osj_TF_list.txt ../../../static/app_data/tf_enrichment/annotation
Click here to show/hide the steps
-
Download Docker, and start the Docker daemon (as in Steps 1 and 2 here).
-
Clone the RicePilaf repository by running:
git clone https://github.com/bioinfodlsu/rice-pilaf
-
Launch a terminal from the root of the cloned repository, and build the Docker image for the workflow by running the following:
docker build -t rice-pilaf-workflow -f Dockerfile-workflow .
-
Spin up a container from the Docker image by running:
docker create --name rice-pilaf-workflow -v path/to/static/in/local:/app/static rice-pilaf-workflow
Replace
path/to/static/in/local
with the path to thestatic
folder in your local machine. It may be more convenient to use the absolute path. If you are using Windows, replace the backward slashes (\
) in the path with forward slashes (/
). -
Use the RicePilaf workflow container as in Steps 5 to 8 here.
Click here to show/hide the steps (best of luck!)
-
Install the following first:
- Python (recommended version: 3.10.13)
- R (recommended version: 4.3.2)
- Java Development Kit (recommended version: 11)
- Git
-
Clone the RicePilaf repository by running:
git clone https://github.com/bioinfodlsu/rice-pilaf
-
Transfer your
static
folder to the root of the cloned repository. It should be in the same level ascallbacks
,pages
, etc. -
Launch a terminal from the root of the cloned repository, and install the required Python libraries by running:
python3 -m pip install -r dependencies/requirements-workflow.txt
-
Install the required R packages by running:
bash dependencies/r-packages-workflow.sh
-
Download ClusterONE (a module detection tool) from here, and save the JAR file to
prepare_data/workflow/scripts/module_detection
. -
Launch a terminal from
prepare_data/workflow/scripts/module_detection
, and run the following commands to compile LazyFox (another module detection tool):git clone https://github.com/TimGarrels/LazyFox mv LazyFox lazyfoxdir cd lazyfoxdir git reset --hard d08f3c084df19bd2a1726159f181bbe3ad6f5bf4 mkdir build cd build cmake .. make mv LazyFox ../../LazyFox cd ../../ rm -r lazyfoxdir chmod +x LazyFox
-
Launch a terminal from
prepare_data/workflow/scripts
, and run the recipes on this terminal:- Note that, if you are using Windows' native terminal (i.e., not WSL), you may have to change
python3
topython
orpy
(depending on your Python installation).
- Note that, if you are using Windows' native terminal (i.e., not WSL), you may have to change
Click here to show/hide the steps
0.1.x
, then the dataset version should be 0.1
.
-
Refer to this spreadsheet for the link to the dataset and its SHA-512 checksum. Note that this link is different from the one on the Installation page.
-
Extract the contents of the downloaded dataset. Doing so should result in a folder named
static
. Inside this should be a single folder namedapp_data
andraw_data
. -
Download Docker, and start the Docker daemon (as in Steps 1 and 2 here).
-
Launch a terminal (from anywhere), and pull the Docker image for the workflow by running:
docker pull ghcr.io/bioinfodlsu/rice-pilaf/workflow:v<RELEASE_VERSION>
Replace
<RELEASE_VERSION>
with the release version of the code. A complete list of all the release versions can be found here. -
Spin up a container from the image by running:
docker create --name rice-pilaf -v path/to/static/in/local:/app/static ghcr.io/bioinfodlsu/rice-pilaf/workflow:<RELEASE_VERSION>
Replace
path/to/static/in/local
with the path to thestatic
folder. It may be more convenient to use the absolute path. If you are using Windows, replace the backward slashes (\
) in the path with forward slashes (/
).Replace
<RELEASE_VERSION>
with the release version of the code.Note: If you are intending to run version ≤ 0.1.1, run the command with
-p 8050:80
. -
Use the RicePilaf workflow container as in Steps 5 to 8 here.