LocusCompare2 is an interactive visualization tool for genetic association analysis of GWAS dataset and eQTL dataset.
LocusCompare2 integrates 6 popular colocalization tools:
- SNP-level colocalization: coloc, fastEnloc, finemap
- Mendelian randomization: SMR
- Transcriptomic association: PrediXcan and TWAS
It could run all the colocalization tools above, display summary report and give manhattan plot and LocusCompare plot for the significant SNPs and genes.
We provide a shell to set up the running environment for LocusCompare2.
- Install miniconda
- Clone locuscompare-v2 from github
- Execute the environment set up script in /running_env folder
cd running_env
./setup_env.sh env.yml
- The console shows 'All finished' when the setup is done. Activate the LocusCompare2 virtual environment by execute:
conda activate colotools
Build LocusCompare2 config file (Sample)
The LocusCompare2 needs a config file to indicate the input data file, output file path, the GWAS and eQTL field name mapping etc.
Specification of config file example:
# Required. The output root dir
working_dir: '/Volumes/HD/biodata/colotools-tools'
tools:
- coloc
- smr
- ecaviar
- fusion
- predixcan
- fastenloc
input:
gwas:
# Required. Input GWAS file path, the position should base on hg38
file: '/raw/Eczema/EAGLE_AD_GWAS_results_2015_hg38.tsv.gz'
# Required. Trait name
trait: 'eczemas'
# Required. GWAS sample size
sample_size: 116863
# Required. Study type, cc or quant
type: cc
# Optional. Seperator of the GWAS input file, escaping character must be in double quotes("\t" for tab), default sep is tab
sep: ' '
# Tell LocusCompare2 the field name in input GWAS file.
col_name_mapping:
# Required. The rs id field name in input GWAS file.
# Can be other values(like variant_id), as long as they match the values of ID column in vcf file and the snp column in eQTL file,
# else clumping and TWAS won't work
snp: 'rsID'
# Required. The chromosome field name in input GWAS file.
chrom: 'chr'
# Required. The snp position filed name in input GWAS file.
position: 'hm_pos'
# Required. The beta field name in input GWAS file.
beta: 'beta'
# Required. The effect allele field name in input GWAS file.
effect_allele: 'reference_allele'
# Required. The other allele field name in input GWAS file.
other_allele: 'other_allele'
# Required. The p-value field name in input GWAS file.
pvalue: 'p.value'
# Required. The se field name in input GWAS file.
se: 'se'
eqtl:
# Required. Input eQTL file path, the position should base on hg38
file: '/raw/eqtl/Spleen.tsv.gz'
# Required. Tissue name
tissue: 'Spleen'
# Required. eQTL sample size
sample_size: 147
# Optional. Seperator of the eQTL input file, escaping character must be in double quotes("\t" for tab), default sep is tab
sep: ' '
# Tell LocusCompare2 the field name in the input eQTL file.
col_name_mapping:
# Required. The rs id field name in input eQTL file.
# Can be other values(like variant_id), as long as they match the snp column in GWAS file,
snp: 'rsid'
# Required. The chromosome field name in input eQTL file.
chrom: 'chromosome'
# Required. The position field name in input eQTL file.
position: 'position'
# Required. The beta field name in input eQTL file.
beta: 'beta'
# Required. The alter allele field name in input eQTL file.
alt: 'alt'
# Required. The reference allele field name in input eQTL file.
ref: 'ref'
# Required. The p-value field name in input eQTL file.
pvalue: 'pvalue'
# Required. The se field name in input eQTL file.
se: 'se'
# Required. The gene id field name in input eQTL file.
gene_id: 'molecular_trait_id'
# Required. The minor allele frequency field name in input eQTL file.
maf: 'maf'
# Required. The vcf files from 1000genomes.
# hg38 https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20220422_3202_phased_SNV_INDEL_SV/
vcf: '/PATH/vcf/hg38'
# Required, must match the gene version in eQTL files
# If you want to use GTEx eQTL, download https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_26/gencode.v26.basic.annotation.gtf.gz for GTEx V8
# Then set the file's local path here.
# If you want to use your own eQTL, set the eQTL reference gencode version path here.
genecode: '/PATH/gencode.v26.basic.annotation.gtf.gz'
# Required if you want to run PrediXcan.
# How to derive data:
# * PrediXcan provides the prediction model and covariance file by their prediction strategies on GTEx v8 release data.
# You could download and unzip the data from https://zenodo.org/record/3518299/files/mashr_eqtl.tar
# * If you have your own eQTL data or other version of GTEx, please refer to predictdb-pipeline module to build the prediction
# model and covariance.
# Then set the prediction model and covariance based directory here.
# LocusCompare2 will find the model and covariance in this directory by the configured eQTL tissue name. Note that the db file name must end with .db and the covariance must end with .txt.gz
# The model and covariance file name format should be 'mashr_{tissue_name}.db' and 'mashr_{tissue_name}.txt.gz'.
# For example, if the eQTL name is 'Cells_EBV-transformed_lymphocytes', LocusCompare2 will find 'mashr_Cells_EBV-transformed_lymphocytes.db'
# and 'mashr_Cells_EBV-transformed_lymphocytes.txt.gz' in this directory.
prediction_dir: '/PATH/prediction_model_covariance'
# Required if you want to run TWAS
# TWAS weight files of GTEx v8: http://gusevlab.org/projects/fusion/#gtex-v8-multi-tissue-expression, download and unpack the files.
# If you want to compute your weights, refer to predictive-model-pipeline module
# LocusCompare2 will find the pos file in this directory by the configured eQTL tissue name. Note that the post file name must end with .pos
twas_model_dir: '/PATH/twas_model'
p-value_threshold:
# GWAS and eQTL significance P-value threshold. Coefficient type should be float.
# GWAS P-value threshold must <=1.0E-7, eQTL P-value threshold must <=1.0E-5, values out of this range will be discarded
# For example, use 5.0E-8 rather than 5E-8
gwas: 5.0E-8
eqtl: 1.0E-5
# Your research population, EUR, EAS, SAS, AMR or AFR
population: 'EUR'
If you need to run a number of GWAS and eQTL data, you need to provide config file for each GWAS-eQTL pair.
The config generator is a tool to generate config files.
- Use Config Generator:
- Create a GWAS config list yaml file. Sample
- Create a eQTL config list yaml file. Sample
- Create a global config yaml file. Sample
- Run python script
python colotools_project_path/common/config_generator.py --out output_dir
[--gb_temp global_config_path] [--gw_temp gwas_configs_path] [--eqtl_temp eqtl_configs_path]
- Config files for each GWAS-eQTL pair will be created to the output directory.
- We incorporate INTACT to output an ensemble score based on the results of different tools. It is highly recommended to run all the tools, else INTACT score and report may not be generated.
- Run LocusCompare2 in command line
python path_to_colotools/colotools.py --config config_yml_file_or_dir [--tools_config parameter_config_file_for_each_tool] [--disable_parallel] [--log path_to_logfile] [--no_report]
Parameter | Description |
---|---|
config | Required. The config file path or the directory that contains config yml file. |
tools_config | Optional. Parameters for each tool, example is in /resource/tools_config.yml |
disable_parallel | Optional. Disable parallel mode (enabled by default), parallel run requires more resources (CPU, memory and disk IO) but saves a lot of time. |
no_report | Optional. Generate offline site |
log | Optional. The path to log file |
- Run LocusCompare2 in python project
import colotools
colotools.run(config_yml_file)
After running LocusCompare2, a summarized report and plot will be generated in the working directory.
TODO Specification of the report and plot
working_dir, trait, tissue, population are specified in config.yml.
-
Report
- Path: [working_dir]/processed/[study (value of --config if it's a directory else default)]/figures/index.html
- Open index.html via Chrome.
-
Report data
- Path: [working_dir]/processed/[study (value of --config if it's a directory else default)]/[trait]/[tissue]/[population]/[tool_name]/analyzed/
This project is released under the Apache 2.0 license.