Repository to generate MIMIC-IV-ICD-10-N3 dataset

This respository contains the steps to generate the MIMIC-IV-ICD-10-N3 dataset used in this paper KAMEL: Knowledge Aware Medical Entity Linkage to Automate Health Insurance Claims Processing. To cite the original article:

@article{Lui_Xiang_Krishnaswamy_2024, 
title={KAMEL: Knowledge Aware Medical Entity Linkage to Automate Health Insurance Claims Processing}, 
volume={38}, 
url={https://ojs.aaai.org/index.php/AAAI/article/view/30314}, 
DOI={10.1609/aaai.v38i21.30314}, 
number={21}, 
journal={Proceedings of the AAAI Conference on Artificial Intelligence}, author={Lui, Sheng Jie and Xiang, Cheng and Krishnaswamy, Shonali}, year={2024}, month={Mar.}, 
pages={22797-22805} }

The MIMIC-IV files can be obtained from this website. You can download it to the directory mimicdata/physionet.org

Steps to generate MIMIC-IV-ICD-10-N3 dataset

The original script used to generate the MIMIC-IV-ICD-10-N3 dataset uses Python 3.11. We set all random seeds to 2023.

Download the MIMIC-IV dataset.
Load diagnoses_icd.csv.gz as a pandas DataFrame:
- Apply the filter icd_version==10
- Perform a groupby on subject_id and hadm_id, then aggregate the icd_code into lists.
Load discharge.csv.gz as a pandas DataFrame:
Merge (inner join) the grouped dataframe from step 2 with discharge summary from step 3.
Generate Negative Samples:
- Randomly select one-third of the rows from the dataframe generated in step 4.
- For each selected row, generate dummy ICD codes that do not belong in the same chapter as the original ICD code.
Concatenate dataframes generated from step 4 and 5 to obtain the complete dataset.
Obtain the train and test dataset by performing a random split where test_size=0.3.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
prompts.md		prompts.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Repository to generate MIMIC-IV-ICD-10-N3 dataset

Steps to generate MIMIC-IV-ICD-10-N3 dataset

About

Releases

Packages

luishengjie/kamel_details

Folders and files

Latest commit

History

Repository files navigation

Repository to generate MIMIC-IV-ICD-10-N3 dataset

Steps to generate MIMIC-IV-ICD-10-N3 dataset

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages