Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCLA DATA Predicates file #23

Merged
merged 9 commits into from
Nov 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
197 changes: 197 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -160,3 +160,200 @@ cython_debug/
.idea/

.editorconfig


# Created by https://www.toptal.com/developers/gitignore/api/macos,pycharm,visualstudiocode,windows
# Edit at https://www.toptal.com/developers/gitignore?templates=macos,pycharm,visualstudiocode,windows

### macOS ###
# General
.DS_Store
.AppleDouble
.LSOverride

# Icon must end with two \r
Icon


# Thumbnails
._*

# Files that might appear in the root of a volume
.DocumentRevisions-V100
.fseventsd
.Spotlight-V100
.TemporaryItems
.Trashes
.VolumeIcon.icns
.com.apple.timemachine.donotpresent

# Directories potentially created on remote AFP share
.AppleDB
.AppleDesktop
Network Trash Folder
Temporary Items
.apdisk

### macOS Patch ###
# iCloud generated files
*.icloud

### PyCharm ###
# Covers JetBrains IDEs: IntelliJ, RubyMine, PhpStorm, AppCode, PyCharm, CLion, Android Studio, WebStorm and Rider
# Reference: https://intellij-support.jetbrains.com/hc/en-us/articles/206544839

# User-specific stuff
.idea/**/workspace.xml
.idea/**/tasks.xml
.idea/**/usage.statistics.xml
.idea/**/dictionaries
.idea/**/shelf

# AWS User-specific
.idea/**/aws.xml

# Generated files
.idea/**/contentModel.xml

# Sensitive or high-churn files
.idea/**/dataSources/
.idea/**/dataSources.ids
.idea/**/dataSources.local.xml
.idea/**/sqlDataSources.xml
.idea/**/dynamic.xml
.idea/**/uiDesigner.xml
.idea/**/dbnavigator.xml

# Gradle
.idea/**/gradle.xml
.idea/**/libraries

# Gradle and Maven with auto-import
# When using Gradle or Maven with auto-import, you should exclude module files,
# since they will be recreated, and may cause churn. Uncomment if using
# auto-import.
# .idea/artifacts
# .idea/compiler.xml
# .idea/jarRepositories.xml
# .idea/modules.xml
# .idea/*.iml
# .idea/modules
# *.iml
# *.ipr

# CMake
cmake-build-*/

# Mongo Explorer plugin
.idea/**/mongoSettings.xml

# File-based project format
*.iws

# IntelliJ
out/

# mpeltonen/sbt-idea plugin
.idea_modules/

# JIRA plugin
atlassian-ide-plugin.xml

# Cursive Clojure plugin
.idea/replstate.xml

# SonarLint plugin
.idea/sonarlint/

# Crashlytics plugin (for Android Studio and IntelliJ)
com_crashlytics_export_strings.xml
crashlytics.properties
crashlytics-build.properties
fabric.properties

# Editor-based Rest Client
.idea/httpRequests

# Android studio 3.1+ serialized cache file
.idea/caches/build_file_checksums.ser

### PyCharm Patch ###
# Comment Reason: https://github.com/joeblau/gitignore.io/issues/186#issuecomment-215987721

# *.iml
# modules.xml
# .idea/misc.xml
# *.ipr

# Sonarlint plugin
# https://plugins.jetbrains.com/plugin/7973-sonarlint
.idea/**/sonarlint/

# SonarQube Plugin
# https://plugins.jetbrains.com/plugin/7238-sonarqube-community-plugin
.idea/**/sonarIssues.xml

# Markdown Navigator plugin
# https://plugins.jetbrains.com/plugin/7896-markdown-navigator-enhanced
.idea/**/markdown-navigator.xml
.idea/**/markdown-navigator-enh.xml
.idea/**/markdown-navigator/

# Cache file creation bug
# See https://youtrack.jetbrains.com/issue/JBR-2257
.idea/$CACHE_FILE$

# CodeStream plugin
# https://plugins.jetbrains.com/plugin/12206-codestream
.idea/codestream.xml

# Azure Toolkit for IntelliJ plugin
# https://plugins.jetbrains.com/plugin/8053-azure-toolkit-for-intellij
.idea/**/azureSettings.xml

### VisualStudioCode ###
.vscode/*
!.vscode/settings.json
!.vscode/tasks.json
!.vscode/launch.json
!.vscode/extensions.json
!.vscode/*.code-snippets

# Local History for Visual Studio Code
.history/

# Built Visual Studio Code Extensions
*.vsix

### VisualStudioCode Patch ###
# Ignore all local history of files
.history
.ionide

### Windows ###
# Windows thumbnail cache files
Thumbs.db
Thumbs.db:encryptable
ehthumbs.db
ehthumbs_vista.db

# Dump file
*.stackdump

# Folder config file
[Dd]esktop.ini

# Recycle Bin used on file shares
$RECYCLE.BIN/

# Windows Installer files
*.cab
*.msi
*.msix
*.msm
*.msp

# Windows shortcuts
*.lnk

# End of https://www.toptal.com/developers/gitignore/api/macos,pycharm,visualstudiocode,windows
33 changes: 33 additions & 0 deletions src/MEDS_DEV/datasets/UCLA/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# UCLA DDR MEDS Dataset

UCLA Health serves well over half a million patients annually, and as a result, UCLA possesses a vast database of Electronic Health Records (EHR). The stemming Discovery Data Repository (DDR) serves as an invaluable resource for researchers. Within the DDR Medical record concepts and organization, and the UCLA ATLAS Precision Health Biobank which integrates genetic information and de-identified medical records enable precision health research.

## Access Requirements

In order to gain access you will need to clear and obtain HIPAA compliance. *HIPAA compliance is the practice of following the Health Insurance Portability and Accountability Act (HIPAA) to protect the privacy and security of health information.* The dataset is not publicly available but people within UCLA can help run analysis on your study by sending over **source code** and **model weights** if it is a pre-trained architecture.

- **License (for files)**: Specify the license under which the dataset files are distributed. (Pending)
- **Data Use Agreement**: Specify any data use agreement that must be signed to access the dataset. (Pending)

## Supported Tasks

Currently the tasks supported by the UCLA-MEDS format are follows:

- `tasks/mortality/in_icu/first_24h.yaml`

## MEDS-transformation

Researchers at UCLA have transformed our DDR database into the medsformat making it quick and easy to run any analyses at our institution.

## Sources

Summarize the sources of the dataset. If the dataset is a combination of multiple sources, list them here.

1. https://link-to-dataset.org

## Contact

For queries or questions. Feel free to send emails to

- Simon A. Lee ([email protected])
- Jeffrey N. Chiang ([email protected])
18 changes: 18 additions & 0 deletions src/MEDS_DEV/datasets/UCLA/predicates.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
predicates:
hospital_admission:
code: { regex: "^HOSPITAL_ADMISSION//.*" }
hospital_discharge:
code: { regex: "^HOSPITAL_DISCHARGE//.*" }
Comment on lines +2 to +5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Documentation needed for predicate pattern format

The regex patterns for hospital events are consistently implemented across multiple datasets (UCLA, MIMIC-IV), but there's no documentation explaining:

  • The expected format after the "//" delimiter
  • Example valid strings
  • Purpose of this pattern structure

This documentation would be valuable since:

  • The README.md is currently minimal with just TODOs
  • The pattern is used across multiple datasets
  • No examples or explanations exist in the codebase
🔗 Analysis chain

Consider documenting the expected format after "//"

The regex patterns for hospital events look good, but it would be helpful to document:

  1. The expected format of data after "//"
  2. Examples of valid strings

Let's check if there are any existing examples or documentation:


The previous search revealed that this pattern is consistently used across multiple datasets (MIMIC-IV, UCLA, AUMCdb). Let's check for documentation and example usage to make a more informed decision about whether documentation is really needed.


Let's check the README content and actual usage examples to make a final determination about whether documentation is needed.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for examples or documentation of these patterns
rg -i "HOSPITAL_(ADMISSION|DISCHARGE)//" --type md

Length of output: 6849


Script:

#!/bin/bash
# Let's check for any documentation files in the UCLA dataset directory
fd -e md -e txt . src/MEDS_DEV/datasets/UCLA/

# Also check for any comments or documentation in the predicates file
cat src/MEDS_DEV/datasets/UCLA/predicates.yaml

# Check if there are any test files that might show example usage
fd test src/MEDS_DEV/datasets/UCLA/

Length of output: 591


Script:

#!/bin/bash
# Let's check the README for any documentation about these patterns
cat src/MEDS_DEV/datasets/UCLA/README.md

# Also check if there are any examples in the codebase showing how these patterns are used
rg "HOSPITAL_ADMISSION//" -A 1 -B 1

Length of output: 807


ED_registration:
code: { regex: "^ED_REGISTRATION//.*" }
ED_discharge:
code: { regex: "^ED_DISCHARGE//.*" }

icu_admission:
code: { regex: "^ICU_ADMISSION//.*" }
icu_discharge:
code: { regex: "^ICU_DISCHARGE//.*" }

death:
code: MORTALITY
Loading