-
Notifications
You must be signed in to change notification settings - Fork 14
Home
Framework for Interpretable Neural Networks for genetics
Install GenNet according to the readme. TIP: if you are using GenNet on a cluster there are often precompiled modules available. Create a virtual environment and load the precompiled modules (For example: module module load TensorFlow/2.2.0-fosscuda-2019b-Python-3.7.4
) before pip3 install -r requirements_GenNet.txt
.
To test GenNet you can run the example study. To run the classification example:
-
Activate your virtual environment and navigate to the GenNet folder.
-
Train the network on the example data:
python GenNet.py train -path ./examples/example_classification/ -ID 1
. The first argument is the path to the example_classification folder. The second argument is the jobid, an unique number for each experiment. If you ran an experiment succesfully and use the same jobid the network will load the trained network from the previous experiment and use this to evaluate the performance on the validation and test set. More information about the arguments and the optional arguments can be inspected usingpython GenNet.py train --help
. After using the command it shows first information about your GPU followed by an overview of the network and the training process of the network. Training the example network should take a couple of minutes. -
Use the build-in plot functions to visualize your results. To see your options use:
python GenNet.py plot --help
or the plot section in Modules. Visualing the example study:-
python GenNet.py plot -ID 1 -type manhattan_relative_importance
Manhattan plot using the relative importance (multiplication of all the weights from the output to the input) -
python GenNet.py plot -ID 1 -type sunburst
the relative importance are summed over genes, pathways or tissues and displayed in a sunburst plot.
Or plot the weights of the network per layer:
python GenNet.py plot -ID 1 -type layer_weight -layer_n 0
python GenNet.py plot -ID 1 -type layer_weight -layer_n 1
python GenNet.py plot -ID 1 -type layer_weight -layer_n 2
-
The manhattan with the relative importance of all input SNPs is shown below. All plots for the classification example can be found here: https://github.com/ArnovanHilten/GenNet/tree/master/figures/classification_example
As seen in the overview the commmand line takes 3 inputs:
- genotype.h5 - a genotype matrix, each row is an example (subject) each column is a feature (e.g. genetic variant).
-
subject.csv - a .csv file with the following columns:
- patient_id: am ID for each patient
- labels: phenotype (with zeros and ones for classification and values for regression)
- genotype_row: in which row the subject is in the genotype.h5 file
- set: in which set the patient belongs (1 = training set, 2 = validation set, 3 = test, others= ignored)
- topology - each row is a "path" of the network, from input to output node.
Topology example (from GenNet/processed_data/example_study) :
layer0_node | layer0_name | layer1_node | layer1_name | layer2_node | layer2_name |
---|---|---|---|---|---|
0 | SNP0 | 0 | HERC2 | 0 | Causal_path |
5 | SNP5 | 1 | BRCA2 | 0 | Causal_path |
76 | SNP76 | 6 | EGFR | 1 | Control_path |
NOTE: It is important to name the column headers as shown in the table. The input 5 is connected to the node number 1 in layer 1. That node is connected to node 0 in layer 2. This is the last given layer name so this node is also connected to the output. The network will have as many layers as there are columns with the name layer.._node. Creating 10 columns with the names layer0_node, layer1_node.. layer10_node will results in 10 layers.
Tip: Use as example the example study found in the processed_data folder.
usage: GenNet.py [-h] {convert,train,plot,topology} ...
GenNet: Interpretable neural networks for phenotype prediction.
positional arguments:
{convert,train,plot,topology}
GenNet main options
convert Convert genotype data to hdf5
train Trains the network
plot Generate plots from a trained network
topology Create standard topology files
optional arguments:
-h, --help show this help message and exit
The current pipeline works well for small (< 100 GB) datasets for larger datasets please contact [email protected]
example: python GenNet.py convert -g /media/charlesdarwin/plink/ -o /media/charlesdarwin/processed_data/ -study_name name_of_plink_files -step all
usage: GenNet.py convert [-h] [-g GENOTYPE [GENOTYPE ...]] -study_name
STUDY_NAME [STUDY_NAME ...] [-variants VARIANTS]
[-o OUT] [-ID] [-vcf] [-tcm TCM]
[-step {all,hase_convert,merge,impute,exclude,transpose,merge_transpose,checksum}]
[-n_jobs N_JOBS]
optional arguments:
-h, --help show this help message and exit
-g GENOTYPE [GENOTYPE ...], --genotype GENOTYPE [GENOTYPE ...]
path/paths to genotype data folder
-study_name STUDY_NAME [STUDY_NAME ...]
Name for saved genotype data, without ext
-variants VARIANTS Path to file with row numbers of variants to include,
if none is given all variants will be used
-o OUT, --out OUT path for saving the results, default ./processed_data
-ID Flag to convert minimac data to genotype per subject
files first (default False)
-vcf Flag for VCF data to convert
-tcm TCM Modifier for chunk size during TRANSPOSING make it
lower if you run out of memory during transposing
-step {all,hase_convert,merge,impute,exclude,transpose,merge_transpose,checksum}
Modifier to choose step to do
-n_jobs N_JOBS Choose jobs > 1 for multiple job submission on a
cluster
Trains the neural network. The first argument is the path to the folder with the three required files. The second argument is the experiment identifier.
Example: python GenNet.py train ./processed_data/example_study/ 1
Usage: GenNet.py train [-h] [-problem_type {classification,regression}] [-wpc weight positive class] [-lr learning rate] [-bs batch size] [-epochs number of epochs] [-L1] path ID
Positional arguments:
path path to the data
ID ID of the experiment
optional arguments:
-h, --help show this help message and exit
-problem_type {classification,regression}
Type of problem, choices are: classification or
regression
-wpc weight positive class
Hyperparameter:weight of the positive class
-lr learning rate, --learning_rate learning rate
Hyperparameter: learning rate of the optimizer
-bs batch size, --batch_size batch size
Hyperparameter: batch size
-epochs number of epochs
Hyperparameter: batch size
-L1 Hyperparameter: value for the L1 regularization
pentalty similar as in lasso, enforces sparsity
Generate plots from results
latest info python GenNet.py plot --help
Example: python GenNet.py plot 1 -type layer_weight -layer_n 0
Example: python GenNet.py plot 1 -type sunburst
Example: python GenNet.py plot 1 -type manhattan_relative_importance
Usage: GenNet.py plot [-h] [-type {layer_weight,sunburst,manhattan_relative_importance}] [-layer_n Layer_number:] ID
positional arguments:
ID ID of the experiment
optional arguments:
-h, --help show this help message and exit
-type {layer_weight,sunburst,manhattan_relative_importance}
-layer_n Layer_number:
Only for layer weight: Number of the to be plotted