README

Instructions on how to update the PubMedDB annually and how to use the non-relational database.

Scripts

infotojson.py - converting information in baseline xml files to a single JSON document
jsontodb.py - read JSON document into a database
gettfidf.py - query database based on user input to obtain TF-IDFs and output results into a file

Conda Environment

All packages are provided within the YML environment file. A conda environment named pubmeddb can be created using the following command.

conda env create -f ./pubmeddb.yml
conda activate pubmeddb

Baseline

Please use the DATA TRANSFER node of Sockeye to download the PubMed baseline (https://ftp.ncbi.nlm.nih.gov/pubmed/baseline/) and gene2pubmed (https://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2pubmed.gz).

ssh <cwl>@dtn.sockeye.arc.ubc.ca

Run download script:

bash ./utils/dl_pubmeddata.sh

XML to JSON

Please edit the PBS -M with your email address in pubmed_submit.sh.

##PBS -M <email>

Run the following code in the COMPUTE node and submit script as a job from a tempory/scratch directory (currently project directory is only readable by the compute nodes).

ssh <cwl>@sockeye.arc.ubc.ca

cd <SCRATCH DIR>
qsub /project/st-wasserww-1/PubMed_DB/pubmed_submit.sh

JSON Fields

PubMedID Collection

Gene Collection

{
   	"PMID":"XX",
   	"ArticleTitle": "xx",
   	"Abstract":{
        	"Text": "XX",
        	"Words":{
			"Word1":{
	            		"Stems": [xx , xx, xx],
	                	"Count": 1
        			},
			"Word2":{ 
		               	"Stems": [xx , xx, xx],
		               	"Count": 1
				},
		}
	},
	"Country": "XX",
	"MeshHeading":{
		"MeshIdentifier (Ex. D000818)":{
			"DescriptorName": "XX",
			"QualifierName":{}
		}
	}	
}

{
	"GeneID": XX,
	“Name”: XX,
	"TaxonomyID": XX,
	"PubMedID": [xx , xx, xx]
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

README

Table of Contents

Scripts

Conda Environment

Baseline

XML to JSON

JSON Fields

Usage

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
utils		utils
Notes.txt		Notes.txt
README.md		README.md
gettfidf.py		gettfidf.py
infotojson.py		infotojson.py
jsontodb.py		jsontodb.py
pubmed_submit.sh		pubmed_submit.sh
pubmeddb.yml		pubmeddb.yml

wassermanlab/pubmed_db

Folders and files

Latest commit

History

Repository files navigation

README

Table of Contents

Scripts

Conda Environment

Baseline

XML to JSON

JSON Fields

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages