-
Notifications
You must be signed in to change notification settings - Fork 1
3.4. Datasets
Mark Edward M. Gonzales edited this page Mar 1, 2024
·
1 revision
This page enumerates the datasets used and integrated by RicePilaf.
Note to Page Maintainers:
- The links under the project and source columns should point to the actual download links (not the homepage of the project or source).
- The links under the publication column should point to the DOI to ensure persistence.
Dataset Type | Project | Publication | Save Location |
---|---|---|---|
Genome sequences, annotation, gene descriptions, and orthology maps of rice varieties | Rice Gene Index | Molecular Plant |
static/app_data/genomes , static/app_data/annotations , static/app_data/gene_descriptions , static/raw_data/gene_ID_mapping_fromRGI [1]
|
Protein, protein domain, and protein family information from UniProt, InterPro, and Pfam | Obtained via automatic queries using PyRice | Bioinformatics | static/raw_data/iric_data/iric_data_original.pkl |
QTL from published literature | QTARO (Accessed June 2016) | Rice | static/raw_data/qtaro/Qtaro_Mar2016_convMSU_1849.csv |
Mapping | Source | Save Location |
---|---|---|
InterPro Accession to Name | InterPro | static/raw_data/iric_data/interpro2name.txt |
Pfam Accession to Name | InterPro [2] | static/raw_data/iric_data/pfam2name.json |
[1] We thank Jianwei Zhang and Zhichao Yu for providing orthology maps from the Rice Gene Index.
[2] Pfam is now hosted by InterPro. Choose the "Export to JSON" option to download the file.
Dataset Type | Project (URL) | Publication | Save Location |
---|---|---|---|
Related PubMed articles | In-house text-mined dataset | – | static/app_data/text_mining/annotated_abstracts.tsv |
Dataset Type | Project | Publication | Save Location |
---|---|---|---|
Co-expression network | RiceNet | Nucleic Acids Research | static/app_data/networks/OS-CX.txt |
Co-expression network | Rice Combined Mutual Ranked Network (RCRN) | Frontiers in Plant Science | static/app_data/networks/RCRN.txt |
Gene ontology annotations | Rice Annotation Project Database (RAP-DB) | Plant & Cell Physiology |
static/raw_data/enrichment_analysis/rap_db/IRGSP-1.0_representative_annotation_2023-03-15.tsv [3]
|
Gene ontology annotations | agriGO v2.0 | Nucleic Acids Research | static/raw_data/enrichment_analysis/go/agrigo.tsv |
Gene, plant, and trait ontology annotations | Oryzabase | Plant Physiology | static/raw_data/enrichment_analysis/go/OryzabaseGeneListAll_20230322010000.txt |
Pathway maps | Fetched from the Kyoto Encyclopedia of Genes and Genomes (KEGG) via KEGGREST | Nucleic Acids Research | static/raw_data/enrichment_analysis/kegg_dosa/geneset/kegg-dosa-geneset.tsv |
Mapping | Source | Save Location |
---|---|---|
MSU to RAP-DB accessions | RAP-DB |
static/raw_data/enrichment_analysis/rap_db/RAP-MSU_2023-03-15.txt [4]
|
RAP-DB accessions to KEGG transcript IDs | RAP-DB |
static/raw_data/enrichment_analysis/rap_db/IRGSP-1.0_representative_annotation_2023-03-15.tsv [3]
|
[3] Text file is obtained by running gzip -dv IRGSP-1.0_representative_annotation_2023-03-15.tsv.gz
[4] Text file is obtained by running gzip -dv RAP-MSU_2023-03-15.txt.gz
Dataset Type | Project | Publication | Save Location |
---|---|---|---|
Transcription factor binding sites | PlantRegMap | Nucleic Acids Research | static/raw_data/tf_enrichment |
Dataset Type | Project | Publication | Save Location |
---|---|---|---|
Open chromatin | RiceENCODE | Molecular Plant | static/app_data/open_chromatin |