juliaapolonio/Causeway is a pipeline for Mendelian Randomization and sensitivity analysis between a phenotype GWAS sumstats and QTL data.
Previous MR tools have been used to analyze a small number of exposure-outcome combinations, but they are not optimized to perform with a large number of combinations such as in a genome-wide QTL screening. In this context, Causeway was built to enable MR + sensitivity analysis in a user-friendly and computationally effective way. The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The Nextflow DSL2 implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. As a future improvement, when possible, the local modules will be submitted to and installed from nf-core/modules in order to make them available to all nf-core pipelines, and to everyone within the Nextflow community!
This is the main part of the process. It runs GSMR for all Exposures vs the Outcomes and returns: number of IVs, betas, SEs and p-values for each exposure.
With the results from GSMR, this module calculates the FDR p-value for each gene and filters by it and the number of IVs. This step will substantially decrease the number of tasks for the subsequent processes, and therefore, the execution time of the pipeline.
Two Sample MR is an R package that performs Mendelian Randomization and sensitivity analysis. The workflow is configured to run the following 2SMR tests:
- Inverse Variance Weighted regression;
- Simple Median regression;
- Simple mode regression;
- MR Egger regression;
- Heterogeneity Egger;
- Heterogeneity Inverse Variance Weighted;
- Steiger direction test;
- Pleiotropy Egger intercept;
- MR-PRESSO outlier analysis.
Coloc is an R package for colocalization analysis. For this workflow, the information retrieved from Coloc are:
- H3;
- H4;
- Most probable causal variant.
This set of processes collects all results from the analysis and merges them into a single .csv file and the results are filtered to a list of candidate drug targets. An HTML report is generated with the analysis highlights.
-
Install
Nextflow
(>=22.10.1
) -
Install any of
Docker
,Singularity
(you can follow this tutorial),Podman
,Shifter
orCharliecloud
for full pipeline reproducibility (you can useConda
both to install Nextflow itself and also to manage software within pipelines. Please only use it within pipelines as a last resort; see docs). -
Download the pipeline and test it on a minimal dataset with a single command:
nextflow run juliaapolonio/Causeway -profile test,YOURPROFILE --outdir <OUTDIR>
This will set up 4 genes from eQTLGen cis-eQTL data and 1000 Genomes phase 3 dataset (GRCh37) genotype p-file with a custom Strict Depression summary statistics retrieved from MTAG.
Note that some form of configuration will be needed so that Nextflow knows how to fetch the required software. This is usually done in the form of a config profile (YOURPROFILE
in the example command above).
- The pipeline comes with config profiles called
docker
,singularity
,podman
,shifter
,charliecloud
andconda
which instruct the pipeline to use the named tool for software management. For example,-profile test,docker
.- Please check nf-core/configs to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use
-profile <institute>
in your command. This will enable eitherdocker
orsingularity
and set the appropriate execution settings for your local compute environment.- If you are using
singularity
, please use thenf-core download
command to download images first, before running the pipeline. Setting theNXF_SINGULARITY_CACHEDIR
orsingularity.cacheDir
Nextflow options enables you to store and re-use the images from a central location for future pipeline runs.- If you are using
conda
, it is highly recommended to use theNXF_CONDA_CACHEDIR
orconda.cacheDir
settings to store the environments in a central location for future pipeline runs.
- Start running your own analysis!
nextflow run juliaapolonio/Causeway \
--exposure <EXPOSURE_SAMPLESHEET> \
--outdir <OUTDIR> \
--ref <REFERENCE_FOLDER> \
--outcome <OUTCOME_SAMPLESHEET> \
-profile <docker/singularity/podman/shifter/charliecloud/conda/institute>
MR_workflow needs 3 inputs to run:
- A reference folder;
- An Exposure sample sheet;
- An Outcome file.
Both Exposure and Outcome files should follow the GCTA-Cojo format. The Exposure file should be separated by one gene per file. The reference files should be in PLINK bfile format. Neither the Exposure nor the Outcome files should contain multi-allelic SNPs; the frequency (freq) is the Minor Allele Frequency (MAF). If the Outcome has a small number of SNPs (less than 2M) it is expected that a substantial amount of the tasks will fail due to lack or small number of matching IVs between the Exposure and Outcome data. If the Outcome data has a large number of SNPs (more than 10M) it is still expected that around 10% of GSMR tasks will fail.
If successfully run, the workflow should give three files as the main output:
summary_report.html
is a html report with all analysis highlights;mr_merged_results.csv
should contain all analyses results for each GSMR significant gene;significant_genes.txt
should give a gene list of all genes that fill the criteria defined in its paper.
Other intermediate outputs are stored in a folder with the corresponding process name and are described in the output section.
juliaapolonio/Causeway was authored by Julia Apolonio with João Cavalcante and Diego Coelho's assistance, under Dr. Vasiliki Lagou's supervision.
Causal associations between risk factors and common diseases inferred from GWAS summary data.
Zhihong Zhu, Zhili Zheng, Futao Zhang, Yang Wu, Maciej Trzaskowski, Robert Maier, Matthew R. Robinson, John J. McGrath, Peter M. Visscher, Naomi R. Wray & Jian Yang
Nature Communications 2018 Jan 15. doi: 10.1038/s41467-017-02317-2
The MR-Base platform supports systematic causal inference across the human phenome.
Hemani G, Zheng J, Elsworth B, Wade KH, Baird D, Haberland V, Laurin C, Burgess S, Bowden J, Langdon R, Tan VY, Yarmolinsky J, Shihab HA, Timpson NJ, Evans DM, Relton C, Martin RM, Davey Smith G, Gaunt TR, Haycock PC, The MR-Base Collaboration.
eLife 2018 Jul. doi: 10.7554/eLife.34408
Eliciting priors and relaxing the single causal variant assumption in colocalisation analyses
Chris Wallace
PLOS Genetics 2020 Apr 20. doi: 10.1371/journal.pgen.1008720
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.