Temporal and Non-temporal Dynamics Embedded Model
TANDEM introduces a new modeling architecture that uses temporal dynamics of patient clinical trajectory for disease prediction. This is achieved by embedding EHR data of patients on a biomedical knwoledge graph called SPOKE (Nelson et al. 2019, Nelson et al. 2022). This embedding creates a knowledge graph representation called SPOKEsig (short for SPOKE signature) for patients and could be used for further downstream Machine Learning (ML) pipeline.
In this project, we introduce a concept called temporal SPOKEsig, where we create patient embeddings at multiple time points of patients' timeline and hence capturing the temporal dynamics of the disease. Following figure shows the schema of TANDEM architecture.
Note: This work has been accepted for publication (and for oral presentation) in the proceedings of PSB 2023.
This repo shows the implementation of TANDEM architecture for disease prediction. Here, we consider the prediction of Parkinson's Disease (PD).
-
Download "data" folder (~24 GB) from the box folder.
Unzip the downloaded zip file.
Copy the unzipped "data" folder to the root path of this repo.
Data folder has the following contents:
- train data - both temporal and non-temporal knowledge graph representations of patients for training models
- train metadata - train data patients' row index and their labels
- test data - both temporal and non-temporal knowledge graph representations of patients for evaluating models
- test metadata - test data patients' row index and their labels
- pre-trained models - models (temporal, non-temporal and TANDEM models) trained on their respective train data.
Note: As per the protocol, we cannot share the EHR data of patients even in the de-identified form. Hence, we are sharing their graph representions (obtained using their EHR data) in a box folder which could be further used for ML pipeline.
Note (Optional): We have implemented a separate REST-API service to create graph representations of patients using their EHR data in real time. Check out this repo for that.
-
Create a virtual environment:
virtualenv -p $(which python3) venv
-
Activate the virtual environment:
source venv/bin/activate
-
Install all the required modules:
pip install -r requirements.txt
-
Run a jupyter notebook instance in your machine.
Note: To run the code, it requires more than 24 GB RAM and 8 CPU cores.
-
Run the notebook TANDEM.ipynb