knowledge-graph

This is a place to share the knowledge graph for this project.

The method and code for creating the Knowledge Graph are originally from https://github.com/diatomsRcool/eco-kg.

For this repository we have made minor changes. These will be iteratively updated over time.

Prerequisites

Basic understanding of Python
Basic understanding of Docker containers
Familiarity with what a graph database is
Know how to construct a property graph data model
Basics of the Cypher query language.

Installation

First, clone this GitHub repository to the location where you intend to serve the data from.

To (re)run or regenerate this model from a CLI terminal:

git clone https://github.com/genophenoenvo/knowledge-graph

cd knowledge-graph

Create a Python Environment with Conda

Next, create an environment for running the graph using conda or mamba

# create a Conda environment using the provided environment.yml

conda install -c conda-forge mamba

mamba env create -f environment.yml

conda init bash

exit

Open a new temerminal and activate the new environemnt

conda activate genophenoenvo

# if already created, update the environment
conda env update -f environment.yml

# check Python version -- tested on v3.8.5
python --version

Download the data

After the environment has been tested,

# change directory

cd knowledge-graph

# Create a data directory and pull the compressed tsv files

mkdir -p data/merged

wget wget https://data.cyverse.org/dav-anon/iplant/commons/community_released/genophenoenvo/kg/merged-kg_edges.tsv -O ./data/merged/edges.tsv
wget https://data.cyverse.org/dav-anon/iplant/commons/community_released/genophenoenvo/kg/merged-kg_nodes.tsv -O ./data/merged/nodes.tsv


## (re)Generate the Graphs

run download on the rest of the datasets

python run.py download

run transform

python run.py transform

# run model
python run.py merge

Documentation about the KGX tsv file format can be found here.

Merged TSF files on CyVerse

.tsv files
merged-kg_edges.tsv
merged-kg_nodes.tsv

The screenshot shows some helfpul statistics about the number of nodes and edges added from each resource.

The final merge statistics can be found at https://github.com/genophenoenvo/knowledge-graph/blob/main/merged-kg_stats.yaml

Visualization of the Graph in Neo4J

Download the edges and nodes data from the CyVerse Data Commons WebDav:

cd
cd ~/knowledge-graph/
wget https://data.cyverse.org/dav-anon/iplant/commons/community_released/genophenoenvo/kg/merged-kg_edges.tsv
wget https://data.cyverse.org/dav-anon/iplant/commons/community_released/genophenoenvo/kg/merged-kg_nodes.tsv

Rename the files as nodes.tsv and edges.tsv

Run NEO4j with Docker:

cd ~/genophenoenvo/kg
 docker run -it --rm \
 --publish=7474:7474 \
 --publish=7687:7687 \
 -e NEO4J_dbms_connector_https_advertised__address=":7473" \
 -e NEO4J_dbms_connector_http_advertised__address=":7474" \
 -e NEO4J_dbms_connector_bolt_advertised__address=":7687" \
 --env=NEO4J_AUTH=none  \
 -v ${PWD}:/data  \
 neo4j

Open local address: http://localhost:7474

Loading the KG data into Neo4J

Importing CSV Data into Neo4j

Loading the .tsv files using Aura DB Importer

// Create nodes
LOAD CSV WITH HEADERS FROM 'https://data.cyverse.org/dav-anon/iplant/commons/community_released/genophenoenvo/kg/merged-kg_nodes.csv' AS row
MERGE (g:Gene {id: row.id})
  ON CREATE SET g.name = row.name;

LOAD CSV WITH HEADERS FROM 'https://data.cyverse.org/dav-anon/iplant/commons/community_released/genophenoenvo/kg/merged-kg_nodes.csv' AS row
MERGE (g:Gene {geneId: row.id, name: row.name})
WITH g, row
UNWIND split(row.category, ':') AS category
MERGE (c:Category {name: category})
MERGE (g)-[r:has_attribute_type]->(c)

Querying the Graph in Neo4J

Prerequisites

The merged node.tsv file and edge.tsv file should be uploaded into neo4j for exploration and query.

A Cypher query can be used to find all of the homologous genes that had also been documented to have differential gene expression in either a drought or a saline environment:

Cypher Query

MATCH (e {id:'PECO:0007404'})-[r]->(g),(g)-[q:`biolink:orthologous_to`]-(h),(e {id:'PECO:0007404'})-[s]->(h) RETURN *

MATCH (g {id:'AT5G15850'})-[r:`biolink:orthologous_to`]->(h),(h)-[q:`biolink:has_phenotype`]->(p),(g)-[q]->(s) RETURN *

MATCH (g)-[r:`biolink:has_phenotype`]->(p {id:'TO:0000207'}),(g)-[q:`in_taxon`]->(t {id:'NCBITaxon:4577'}) RETURN *

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
eco_kg		eco_kg
query_utils		query_utils
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
download.yaml		download.yaml
environment.yml		environment.yml
filtered_metadata.txt		filtered_metadata.txt
filtered_traits.txt		filtered_traits.txt
merge.yaml		merge.yaml
oryx-build-commands.txt		oryx-build-commands.txt
requirements.txt		requirements.txt
run.py		run.py
sample-merge-config.yml		sample-merge-config.yml
server_id		server_id
setup.py		setup.py
transform.yaml		transform.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

knowledge-graph

Prerequisites

Installation

Create a Python Environment with Conda

Download the data

run download on the rest of the datasets

run transform

Merged TSF files on CyVerse

Visualization of the Graph in Neo4J

Loading the KG data into Neo4J

Querying the Graph in Neo4J

Prerequisites

Cypher Query

About

Releases

Packages

Contributors 2

Languages

License

genophenoenvo/knowledge-graph

Folders and files

Latest commit

History

Repository files navigation

knowledge-graph

Prerequisites

Installation

Create a Python Environment with Conda

Download the data

run download on the rest of the datasets

run transform

Merged TSF files on CyVerse

Visualization of the Graph in Neo4J

Loading the KG data into Neo4J

Querying the Graph in Neo4J

Prerequisites

Cypher Query

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages