This repository contains the notebooks and code for the 2023 Data Science Challenge (DSC) at Lawrence Livermore National Laboratory (LLNL).
- Link : https://data-science.llnl.gov/dsc
- Authors :
- Mikel Landajuela ([email protected])
- Cindy Gonzales ([email protected])
The electrocardiogram (ECG) provides a non-invasive and cost-effective tool for the diagnosis of heart conditions. However, the standard 12-lead ECG is inadequate for mapping out the electrical activity of the heart in sufficient detail for many clinical applications (e.g., identifying the origins of an arrhythmia). In order to construct a more detailed map of the heart, current techniques require not only ECG readings from dozens of locations on a patient’s body, but also patient-specific anatomical models built from expensive medical imaging procedures. For this Data Science Challenge problem, we consider an alternative data-driven approach to reconstructing electroanatomical maps of the heart at clinically relevant resolutions, which combines input from the standard 12-lead electrocardiogram (ECG) with advanced machine learning techniques. We begin with the clearly-defined task of identifying heart conditions from ECG profiles and then consider a range of more open-ended challenges, including the reconstruction of a complete spatio-temporal activation map of the human heart.
- tutorials
- tutorials/image_classifier_tutorial_v1.2 : Tutorial on image classification
- tutorials/DSC_regression-tutorial : Tutorial on regression
- notebooks
- task_1_getting_started.ipynb : Task 1 notebook
- task_2_getting_started.ipynb : Task 2 notebook
- task_3_getting_started.ipynb : Task 3 notebook
- task_4_getting_started.ipynb : Task 4 notebook
- If you are unfamiliar with the field of machine learning, have a look at the tutorials folder, which contains a set of notebooks to get you started with machine learning.
- The challenge is divided into 4 tasks, each of which is described in detail in the corresponding notebook.
- Start by reading the
task_#_getting_started.ipynb
notebooks for each task, to get familiar with the data and the task.
Get familiar working with ECG data by using the ECG Heartbeat Categorization Dataset to perform binary classification for healthy heartbeat vs. irregular heartbeat
Start by reading the task_1_getting_started.ipynb notebook.
Diagnosing an irregular heartbeat by using the ECG Heartbeat Categorization Dataset to perform multiclass classification to diagnose the irregular heartbeats.
Start by reading the task_2_getting_started.ipynb notebook.
Sequence-to-vector prediction using the Dataset of Simulated Intracardiac Transmembrane Voltage Recordings and ECG Signals to perform activation map reconstruction (i.e. transform a sequence of length 12x500 to 75x1 using a neural network)
Start by reading the task_3_getting_started.ipynb notebook.
Sequence-to-sequence prediction using the Dataset of Simulated Intracardiac Transmembrane Voltage Recordings and ECG Signals to perform transmembrane potential reconstruction (i.e. transform a sequence of length 12x500 to 75x500 using a neural network)
Start by reading the task_4_getting_started.ipynb notebook.
Download dataset
- Download the dataset from the ECG Heartbeat Categorization Dataset
- Unzip the
archive.zip
file - Rename the folder
archive
asecg_dataset
and place it in the root of the git repository
Download dataset
- Download the dataset from the Dataset of Simulated Intracardiac Transmembrane Voltage Recordings and ECG Signals
- You will need to download all the components of the dataset one by one
- Unzip the dataset
Note : For convenience, we have included a bash script to perform the above steps. To use the script, run the following command from the root of the repository:
source download_intracardiac_dataset.sh
Further details
For further details, navigate to the intracardiac_dataset
folder and read the README.md
file.
- Look in
documentation/documentation.pdf
for a detailed description of the dataset, including the simulation process - Look at the files
documentation/dataset_description.png
anddocumentation/dataset_description.csv
for details on each simulation study - Jupyter Notebook: Inspect the data using
notebooks/dataset_inspect.ipynb
- Mathematica Notebook: Inspect the data using
notebooks/dataset_inspect.nb
- The license documents can be found in
license
-
(Task 1) Paper : https://arxiv.org/pdf/1805.00794.pdf
Mohammad Kachuee, Shayan Fazeli, and Majid Sarrafzadeh.
"ECG Heartbeat Classification: A Deep Transferable Representation." arXiv preprint arXiv:1805.00794 (2018). -
(Task 3 & 4) Paper : https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10081783
M. Landajuela, R. Anirudh, J. Loscazo and R. Blake,
"Intracardiac Electrical Imaging Using the 12-Lead ECG: A Machine Learning Approach Using Synthetic Data,"
2022 Computing in Cardiology (CinC), Tampere, Finland, 2022, pp. 1-4, doi: 10.22489/CinC.2022.026. -
(Task 3 & 4) Dataset: https://library.ucsd.edu/dc/object/bb29449106
Landajuela, Mikel; Anirudh, Rushil; Blake, Robert (2022).
Dataset of Simulated Intracardiac Transmembrane Voltage Recordings and ECG Signals.
In Lawrence Livermore National Laboratory (LLNL) Open Data Initiative. UC San Diego Library Digital Collections. https://doi.org/10.6075/J0SN094N -
(Task 3 & 4) Medium Blog post : https://medium.com/p/a20661669937
Data Science Challenge 2023 is distributed under the terms of the MIT license. All new contributions must be made under this license.
See LICENSE, and NOTICE for details.
SPDX-License-Identifier: MIT
LLNL-CODE-849487