Skip to content

3.4. Datasets

Mark Edward M. Gonzales edited this page Mar 1, 2024 · 1 revision

This page enumerates the datasets used and integrated by RicePilaf.

Note to Page Maintainers:

  1. The links under the project and source columns should point to the actual download links (not the homepage of the project or source).
  2. The links under the publication column should point to the DOI to ensure persistence.

1️⃣ Gene List and Lift-Over

Dataset Type Project Publication Save Location
Genome sequences, annotation, gene descriptions, and orthology maps of rice varieties Rice Gene Index Molecular Plant static/app_data/genomes, static/app_data/annotations, static/app_data/gene_descriptions, static/raw_data/gene_ID_mapping_fromRGI [1]
Protein, protein domain, and protein family information from UniProt, InterPro, and Pfam Obtained via automatic queries using PyRice Bioinformatics static/raw_data/iric_data/iric_data_original.pkl
QTL from published literature QTARO (Accessed June 2016) Rice static/raw_data/qtaro/Qtaro_Mar2016_convMSU_1849.csv
Mapping Source Save Location
InterPro Accession to Name InterPro static/raw_data/iric_data/interpro2name.txt
Pfam Accession to Name InterPro [2] static/raw_data/iric_data/pfam2name.json

[1] We thank Jianwei Zhang and Zhichao Yu for providing orthology maps from the Rice Gene Index.
[2] Pfam is now hosted by InterPro. Choose the "Export to JSON" option to download the file.

2️⃣ Gene Retrieval by Text Mining

Dataset Type Project (URL) Publication Save Location
Related PubMed articles In-house text-mined dataset static/app_data/text_mining/annotated_abstracts.tsv

3️⃣ Co-Expression Network Analysis

Dataset Type Project Publication Save Location
Co-expression network RiceNet Nucleic Acids Research static/app_data/networks/OS-CX.txt
Co-expression network Rice Combined Mutual Ranked Network (RCRN) Frontiers in Plant Science static/app_data/networks/RCRN.txt
Gene ontology annotations Rice Annotation Project Database (RAP-DB) Plant & Cell Physiology static/raw_data/enrichment_analysis/rap_db/IRGSP-1.0_representative_annotation_2023-03-15.tsv [3]
Gene ontology annotations agriGO v2.0 Nucleic Acids Research static/raw_data/enrichment_analysis/go/agrigo.tsv
Gene, plant, and trait ontology annotations Oryzabase Plant Physiology static/raw_data/enrichment_analysis/go/OryzabaseGeneListAll_20230322010000.txt
Pathway maps Fetched from the Kyoto Encyclopedia of Genes and Genomes (KEGG) via KEGGREST Nucleic Acids Research static/raw_data/enrichment_analysis/kegg_dosa/geneset/kegg-dosa-geneset.tsv
Mapping Source Save Location
MSU to RAP-DB accessions RAP-DB static/raw_data/enrichment_analysis/rap_db/RAP-MSU_2023-03-15.txt [4]
RAP-DB accessions to KEGG transcript IDs RAP-DB static/raw_data/enrichment_analysis/rap_db/IRGSP-1.0_representative_annotation_2023-03-15.tsv [3]

[3] Text file is obtained by running gzip -dv IRGSP-1.0_representative_annotation_2023-03-15.tsv.gz
[4] Text file is obtained by running gzip -dv RAP-MSU_2023-03-15.txt.gz

4️⃣ Regulatory Feature Enrichment

Dataset Type Project Publication Save Location
Transcription factor binding sites PlantRegMap Nucleic Acids Research static/raw_data/tf_enrichment

5️⃣ Epigenomic Information

Dataset Type Project Publication Save Location
Open chromatin RiceENCODE Molecular Plant static/app_data/open_chromatin