-
Notifications
You must be signed in to change notification settings - Fork 55
Tutorial 3: maxFDR Calculation
In this tutorial, we will walk through an example of calculating the approximate upper bound on the FDR of MTAG results. The theoretical framework and more detailed discussion can be found in Supplement Information Section 1.4 of Turley et al. (2017). Note that MTAG does not automatically filter the traits based on FDR.
We will demonstrate using a subset of the original GWAS results on neuroticisim and subjective well-being from Okbay et al. (2016). These sumstats have been preprocessed and formatted for running mtag
. Please refer to the Sample GWAS Results and Data Format section in in the first tutorial for initial cleaning of the data inputs.
To run the maxFDR calculation, invoke --fdr
when running MTAG in command line. With minimal options:
python ../mtag/mtag.py \
--sumstats ../data/1_OA2016_hm3samp_NEUR.txt,../data/1_OA2016_hm3samp_SWB.txt \
--out ../output/3_maxFDR/3.1_mtag_maxFDR_basic_NS \
--stream_stdout \
--fdr \
--stream_stdout
The following options are available for running maxFDR calculations. It's also helpful to review the comprehensive list of options via python mtag.py -h
and the section "Max FDR calculation":
-
--n_approx
: Depending on the number of traits and the fineness of the grid that are used to make the approximation, the computing time can be substantial. In this case, we recommend using the option--n_approx
to speed up the FDR calculation by replacing the sample size of a SNP by the average sample size across SNPs (for each trait). Updates: We will make this a default option. Turning it off requiresnumpy
version to be greater than 1.13.0. -
--grid_file
: Customized set of grid points to calculate FDR. -
--intervals
: Number of intervals to partition [0,1]. Used to adjust the coarseness of grid. -
--cors
: Number of threads/cores used to parallelize the computation. -
--fit_ss
: Another option to speed up the grid search process - If specified,mtag
will estimate the prior probability that a SNP is null for each trait and then restrict the grid search to the set of probability vectors ("sum") that sum to the prior null for each trait. This is useful for restricting the search space of large-dimensional traits. Notes: This option is still under testing.
Updates: Alternatively, you can specify the path to the MTAG result files and skip the MTAG estimation step. This allows you to only perform maxFDR calculation without going through MTAG again. To do this, you need to repull the MTAG repository (updated since July.xx, 2018), and use the fdr_only.py
as the following:
python ../mtag/fdr_only.py \
--mtag_path ../data/1_OA2016_hm3samp_NEUR.txt,../data/1_OA2016_hm3samp_SWB.txt
The same set of "Max FDR Calculation" options are available in this setting. The output files will have a new prefix [MTAG_PREFIX].FDR
to avoid confusion with the previously created [MTAG_PREFIX].log
file.
In addition to the standard mtag
output as explained in Tutorial 1, specifying --fdr
will make MTAG output two more files:
-
[PREFIX]_fdr_mat.txt
records the maximum false discovery rate for each trait in each state.0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 2.412599223162292111e-03 0.000000000000000000e+00 3.846085678952126680e-03 0.000000000000000000e+00 4.369596250455918054e-03 0.000000000000000000e+00 4.120088494741054597e-03 6.910217361619816305e-04 0.000000000000000000e+00 6.836514978170410255e-04 2.410876601205827474e-03 6.724179301230901167e-04 3.776698888290617052e-03 ... 1.606344506405588525e-04 8.960390758686683941e-04 1.636829652551818101e-04 1.066110509474288925e-04 1.029613181802874888e-04 8.910495939743934735e-05 7.577909678908985259e-05 7.238905719667969871e-04 7.679184008810661379e-05 4.346938850577793643e-05 4.120803680373065888e-05 3.306298969392538757e-05 1.266147392770030122e-05 8.980852269155806025e-06
-
[PREFIX]_prob_grid.txt
records the probability grid or the mixture weights of each state. # rows: number of gridpoints to search, which is a function of the--intervals
specified and the number of traits. # columns: number of states which always equals 2^T.0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 1.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 1.000000000000000056e-01 9.000000000000000222e-01 0.000000000000000000e+00 0.000000000000000000e+00 2.000000000000000111e-01 8.000000000000000444e-01 0.000000000000000000e+00 0.000000000000000000e+00 2.999999999999999889e-01 6.999999999999999556e-01 0.000000000000000000e+00 0.000000000000000000e+00 4.000000000000000222e-01 5.999999999999999778e-01 0.000000000000000000e+00 1.000000000000000056e-01 0.000000000000000000e+00 9.000000000000000222e-01 0.000000000000000000e+00 1.000000000000000056e-01 1.000000000000000056e-01 8.000000000000000444e-01 ... 5.999999999999999778e-01 1.000000000000000056e-01 0.000000000000000000e+00 2.999999999999999889e-01 6.999999999999999556e-01 0.000000000000000000e+00 0.000000000000000000e+00 2.999999999999999889e-01 6.999999999999999556e-01 0.000000000000000000e+00 1.000000000000000056e-01 2.000000000000000111e-01 6.999999999999999556e-01 1.000000000000000056e-01 0.000000000000000000e+00 2.000000000000000111e-01 8.000000000000000444e-01 0.000000000000000000e+00 0.000000000000000000e+00 2.000000000000000111e-01 9.000000000000000222e-01 0.000000000000000000e+00 0.000000000000000000e+00 1.000000000000000056e-01
The summary of maxFDR results are reported at the last section of the MTAG log file. If fdr_only.py
is used, the FDR log file will look like the following:
T=2
Number of gridpoints to search: 65
Performing grid search using 1 cores.
Grid search: 10.0 percent finished for . Time: 0.000 min
Grid search: 20.0 percent finished for . Time: 0.001 min
Grid search: 30.0 percent finished for . Time: 0.001 min
Grid search: 40.0 percent finished for . Time: 0.002 min
Grid search: 50.0 percent finished for . Time: 0.003 min
Grid search: 60.0 percent finished for . Time: 0.003 min
Grid search: 70.0 percent finished for . Time: 0.004 min
Grid search: 80.0 percent finished for . Time: 0.004 min
Grid search: 90.0 percent finished for . Time: 0.005 min
Grid search: 100.0 percent finished for . Time: 0.005 min
Saved calculations of fdr over grid points in ../output/3_maxFDR/3.2_mtag_maxFDR_approx_NS_fdr_mat.txt
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
grid point indices for max FDR for each trait: [12 3]
Maximum FDR
Max FDR of Trait 1: 0.00109655427181 at probs = [ 0. 0.3 0. 0.7]
Max FDR of Trait 2: 0.00436959625046 at probs = [ 0. 0. 0.3 0.7]
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Completed FDR calculations.