Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added task criteria for abnormal lab prediction tasks #25

Merged
merged 6 commits into from
Nov 1, 2024

Conversation

Jwoo5
Copy link
Contributor

@Jwoo5 Jwoo5 commented Oct 17, 2024

All the tasks are defined as predicting whether the patients will have abnormal values for the target lab items within the next 24 hours after the first 24 hours of their ICU admission. It includes ICU patients 1) whose length of stay is at least 24 hours and 2) who have at least one measurement for the target lab items within the target window.

A list of the added tasks are as follows:

  • blood_chemistry
    • elevated_creatinine
      • creatinine > 1.3mg/dL
    • hyponatremia
      • sodium < 135mEq/L
    • metabolic_acidosis
      • bicarbonate < 22mEq/L
  • cbc
    • anemia
      • hemoglobin < 13g/dL (we should revisit this definition as this criterion yields too many positive samples in MIMIC-IV)
    • leukocytosis
      • WBC (White Blood Cells) > 11K/uL
    • thrombocytopenia
      • platelets < 150K/uL
  • vital
    • hypotension
      • MAP (Mean Arterial Pressure) < 65mmHg

There are two questions:

  1. Should we write READMEs to describe high-level definitions for each of these tasks?
  2. Should we also add the configs realized for MIMIC-IV database for these tasks?

Summary by CodeRabbit

  • New Features

    • Introduced multiple predicates for laboratory tests and vital signs, enhancing the dataset for medical data analysis.
    • Added new tasks for predicting various conditions (elevated creatinine, hyponatremia, metabolic acidosis, anemia, leukocytosis, thrombocytopenia, and hypotension) within the first 24 hours of hospital admission, each with specific criteria and measurement requirements.
  • Bug Fixes

    • Improved logical expressions for predicates to ensure accurate evaluations of laboratory results.

Copy link
Contributor

coderabbitai bot commented Oct 17, 2024

Walkthrough

The pull request introduces new predicates and task definitions for evaluating laboratory tests and vital signs in the MIMIC-IV dataset. It includes predicates for various laboratory measurements such as creatinine, sodium, bicarbonate, hemoglobin, white blood cells, platelets, and mean arterial pressure, each with defined conditions for abnormal values. Additionally, new YAML files define tasks for predicting conditions like elevated creatinine, hyponatremia, metabolic acidosis, anemia, leukocytosis, thrombocytopenia, and hypotension within the first 24 hours of hospital admission.

Changes

File Path Change Summary
src/MEDS_DEV/datasets/MIMIC-IV/predicates.yaml Added predicates for birth, creatinine, sodium, bicarbonate, hemoglobin, wbc, platelets, and map, including conditions for abnormal values.
src/MEDS_DEV/tasks/criteria/abnormal_lab/blood_chemistry/elevated_creatinine/first_24h.yaml New task for predicting elevated creatinine levels with specific criteria and predicates.
src/MEDS_DEV/tasks/criteria/abnormal_lab/blood_chemistry/hyponatremia/first_24h.yaml New task for predicting hyponatremia with defined inclusion criteria and predicates.
src/MEDS_DEV/tasks/criteria/abnormal_lab/blood_chemistry/metabolic_acidosis/first_24h.yaml New task for predicting metabolic acidosis with specific thresholds and predicates.
src/MEDS_DEV/tasks/criteria/abnormal_lab/cbc/anemia/first_24h.yaml New task for predicting anemia with defined criteria and predicates.
src/MEDS_DEV/tasks/criteria/abnormal_lab/cbc/leukocytosis/first_24h.yaml New task for predicting leukocytosis with specific inclusion criteria and predicates.
src/MEDS_DEV/tasks/criteria/abnormal_lab/cbc/thrombocytopenia/first_24h.yaml New task for predicting thrombocytopenia with defined criteria and predicates.
src/MEDS_DEV/tasks/criteria/abnormal_lab/vital/hypotension/first_24h.yaml New task for predicting hypotension with specific thresholds and predicates.

Possibly related PRs

  • Adding AUMCdb to MEDS-DEV #27: The changes in this PR introduce new predicates related to ICU admission and discharge, which are relevant to the predicates added for laboratory tests and vital signs in the main PR.
  • added OMOP predicates #29: This PR adds predicates for hospital admission and discharge, which are also relevant to the predicates for laboratory tests and vital signs introduced in the main PR.

Suggested reviewers

  • mmcdermott
  • justin13601

🐰 In the meadow, where data flows,
New predicates sprout, as knowledge grows.
With tasks defined for each vital sign,
We hop along, in a world so fine!
From creatinine highs to sodium lows,
Our medical insights will surely glow! 🌼


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Outside diff range and nitpick comments (13)
src/MEDS_DEV/tasks/criteria/abnormal_lab/in_icu/vital/hypotension/first_24h.yaml (2)

1-5: Consider adding the specific threshold for low MAP values in the description.

The description provides a clear overview of the task and inclusion criteria. However, it would be beneficial to include the specific threshold for what constitutes "low MAP values" (hypotension) in the description. This addition would provide immediate context for users reviewing the task definition.


48-48: Add a new line character at the end of the file.

To comply with YAML best practices and improve consistency, please add a new line character at the end of the file.

🧰 Tools
🪛 yamllint

[error] 48-48: no new line character at the end of file

(new-line-at-end-of-file)

src/MEDS_DEV/tasks/criteria/abnormal_lab/in_icu/cbc/leukocytosis/first_24h.yaml (1)

48-48: Add a newline character at the end of the file.

To comply with YAML best practices and resolve the yamllint error, please add a newline character at the end of the file.

Apply this change to the end of the file:

    label: abnormally_high_wbc
+
🧰 Tools
🪛 yamllint

[error] 48-48: no new line character at the end of file

(new-line-at-end-of-file)

src/MEDS_DEV/tasks/criteria/abnormal_lab/in_icu/blood_chemistry/hyponatremia/first_24h.yaml (2)

1-5: Consider adding the specific threshold for hyponatremia in the description.

The description provides a clear overview of the task. However, to align with the PR objectives and improve clarity, consider including the specific threshold for hyponatremia (sodium levels below 135 mEq/L) in the description.

Here's a suggested addition to the description:

 This task predicts whether the patient will have abnormal _sodium_ values within the next 24
 hours, given the first 24 hours of their admission to the ICU. To be included, patients must have
 at least one measurement for _sodium_ within the next 24 hours after their admission to ICU,
-and their length of stay in ICU should be at least 24 hours.
+and their length of stay in ICU should be at least 24 hours. Hyponatremia is defined as sodium
+levels below 135 mEq/L.

48-48: Add a newline character at the end of the file.

To comply with YAML best practices and resolve the yamllint error, add a newline character at the end of the file.

Apply this change:

    has:
      sodium: (1, None)
    label: abnormally_low_sodium
+
🧰 Tools
🪛 yamllint

[error] 48-48: no new line character at the end of file

(new-line-at-end-of-file)

src/MEDS_DEV/tasks/criteria/abnormal_lab/in_icu/blood_chemistry/elevated_creatinine/first_24h.yaml (3)

1-5: Consider adding the specific threshold for elevated creatinine.

The description clearly states the task's purpose and inclusion criteria. However, it would be beneficial to include the specific threshold for what constitutes "elevated" creatinine levels (e.g., > 1.3 mg/dL) to align with the PR objectives and provide more clarity.


25-48: Consider the following improvements to the windows section:

  1. Input window: The start time is set to null. Consider setting it explicitly to the trigger time for clarity:

    start: trigger
  2. No_discharge window: The start time is exclusive (False), which might allow for immediate discharges. Consider making it inclusive:

    start_inclusive: True
  3. Target window: The label "abnormally_high_creatinine" is correct, but could be more specific. Consider renaming it to "elevated_creatinine" to align with the task name:

    label: elevated_creatinine

These changes will improve the clarity and consistency of the task definition.

🧰 Tools
🪛 yamllint

[error] 48-48: no new line character at the end of file

(new-line-at-end-of-file)


48-48: Add a new line character at the end of the file.

To comply with YAML best practices and improve consistency, please add a new line character at the end of the file.

🧰 Tools
🪛 yamllint

[error] 48-48: no new line character at the end of file

(new-line-at-end-of-file)

src/MEDS_DEV/tasks/criteria/abnormal_lab/in_icu/cbc/anemia/first_24h.yaml (2)

1-5: Consider adding the specific threshold for anemia in the description.

The description clearly outlines the task's purpose and inclusion criteria. However, it would be beneficial to include the specific threshold for "low hemoglobin values" (anemia) in the description. This would provide immediate context for users reviewing the task definition.

Consider updating the description to include the threshold, for example:
"This task predicts whether the patient will have low hemoglobin values (anemia, defined as hemoglobin levels below 13 g/dL) within the next 24 hours, ..."


25-48: Consider adding a comment to explain the "trigger" reference.

The windows are correctly defined and align with the task description. However, it would be helpful to add a comment explaining that "trigger" refers to the "icu_admission" event defined in the trigger section. This would improve clarity for readers who might not be familiar with the YAML structure.

Consider adding a comment like this at the beginning of the windows section:

windows:
  # Note: "trigger" refers to the "icu_admission" event defined in the trigger section
  input:
    ...
🧰 Tools
🪛 yamllint

[error] 48-48: no new line character at the end of file

(new-line-at-end-of-file)

src/MEDS_DEV/tasks/criteria/abnormal_lab/in_icu/cbc/thrombocytopenia/first_24h.yaml (1)

25-48: LGTM: Well-defined windows with a minor formatting issue.

The windows (input, no_discharge, and target) are correctly defined and align with the task description. They appropriately capture the required timeframes for predicting thrombocytopenia within 24 hours of ICU admission.

However, there's a minor formatting issue:

Add a newline character at the end of the file to comply with YAML best practices and resolve the yamllint error.

🧰 Tools
🪛 yamllint

[error] 48-48: no new line character at the end of file

(new-line-at-end-of-file)

src/MEDS_DEV/tasks/criteria/abnormal_lab/in_icu/blood_chemistry/metabolic_acidosis/first_24h.yaml (2)

26-31: Clarify the 'input' window's start time.

The 'input' window's start time is set to null. While this might be intentional to include all available data up to the trigger point, it could benefit from a brief comment explaining this choice. Consider adding a comment to clarify the reasoning behind setting the start time to null.

Example:

input:
  start: null  # Include all available data up to the trigger point
  end: trigger + 24h
  # ... rest of the input window configuration

48-48: Add a newline character at the end of the file.

The file is missing a newline character at the end. While this doesn't affect functionality, it's a common convention to end files with a newline character. This can help with text processing and version control systems.

Please add a blank line at the end of the file.

🧰 Tools
🪛 yamllint

[error] 48-48: no new line character at the end of file

(new-line-at-end-of-file)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 30d5631 and daadf9d.

📒 Files selected for processing (7)
  • src/MEDS_DEV/tasks/criteria/abnormal_lab/in_icu/blood_chemistry/elevated_creatinine/first_24h.yaml (1 hunks)
  • src/MEDS_DEV/tasks/criteria/abnormal_lab/in_icu/blood_chemistry/hyponatremia/first_24h.yaml (1 hunks)
  • src/MEDS_DEV/tasks/criteria/abnormal_lab/in_icu/blood_chemistry/metabolic_acidosis/first_24h.yaml (1 hunks)
  • src/MEDS_DEV/tasks/criteria/abnormal_lab/in_icu/cbc/anemia/first_24h.yaml (1 hunks)
  • src/MEDS_DEV/tasks/criteria/abnormal_lab/in_icu/cbc/leukocytosis/first_24h.yaml (1 hunks)
  • src/MEDS_DEV/tasks/criteria/abnormal_lab/in_icu/cbc/thrombocytopenia/first_24h.yaml (1 hunks)
  • src/MEDS_DEV/tasks/criteria/abnormal_lab/in_icu/vital/hypotension/first_24h.yaml (1 hunks)
🧰 Additional context used
🪛 yamllint
src/MEDS_DEV/tasks/criteria/abnormal_lab/in_icu/blood_chemistry/elevated_creatinine/first_24h.yaml

[error] 48-48: no new line character at the end of file

(new-line-at-end-of-file)

src/MEDS_DEV/tasks/criteria/abnormal_lab/in_icu/blood_chemistry/hyponatremia/first_24h.yaml

[error] 48-48: no new line character at the end of file

(new-line-at-end-of-file)

src/MEDS_DEV/tasks/criteria/abnormal_lab/in_icu/blood_chemistry/metabolic_acidosis/first_24h.yaml

[error] 48-48: no new line character at the end of file

(new-line-at-end-of-file)

src/MEDS_DEV/tasks/criteria/abnormal_lab/in_icu/cbc/anemia/first_24h.yaml

[error] 48-48: no new line character at the end of file

(new-line-at-end-of-file)

src/MEDS_DEV/tasks/criteria/abnormal_lab/in_icu/cbc/leukocytosis/first_24h.yaml

[error] 48-48: no new line character at the end of file

(new-line-at-end-of-file)

src/MEDS_DEV/tasks/criteria/abnormal_lab/in_icu/cbc/thrombocytopenia/first_24h.yaml

[error] 48-48: no new line character at the end of file

(new-line-at-end-of-file)

src/MEDS_DEV/tasks/criteria/abnormal_lab/in_icu/vital/hypotension/first_24h.yaml

[error] 48-48: no new line character at the end of file

(new-line-at-end-of-file)

🔇 Additional comments (15)
src/MEDS_DEV/tasks/criteria/abnormal_lab/in_icu/vital/hypotension/first_24h.yaml (2)

23-23: LGTM: Trigger correctly set to ICU admission.

The trigger is appropriately set to "icu_admission", which aligns with the task description of predicting hypotension within 24 hours of ICU admission.


26-40: LGTM: Input and no_discharge windows are correctly defined.

The input and no_discharge windows are well-defined and align with the task description:

  • The input window correctly covers the first 24 hours after ICU admission.
  • The no_discharge window ensures that no discharge events occur within the first 24 hours of ICU admission.
src/MEDS_DEV/tasks/criteria/abnormal_lab/in_icu/cbc/leukocytosis/first_24h.yaml (3)

1-5: LGTM: Clear and comprehensive description.

The description accurately outlines the task's purpose, prediction window, and inclusion criteria, aligning well with the PR objectives.


23-23: LGTM: Appropriate trigger defined.

The trigger is correctly set to "icu_admission", which aligns with the task description and PR objectives.


25-48: LGTM: Well-defined windows section.

The windows section accurately defines the input, no_discharge, and target windows, aligning with the task description and PR objectives. The constraints and labeling criteria are appropriately set.

🧰 Tools
🪛 yamllint

[error] 48-48: no new line character at the end of file

(new-line-at-end-of-file)

src/MEDS_DEV/tasks/criteria/abnormal_lab/in_icu/blood_chemistry/hyponatremia/first_24h.yaml (2)

23-23: LGTM: Trigger is correctly set.

The trigger is appropriately set to icu_admission, which aligns with the task description and PR objectives.


25-48: LGTM: Windows section is well-defined.

The windows section is correctly defined and aligns with the task description and PR objectives:

  • The input window covers the first 24 hours after ICU admission.
  • The no_discharge window ensures no discharge occurs within the first 24 hours.
  • The target window is set to the 24 hours following the input window.

This structure accurately captures the requirements for predicting hyponatremia within the specified timeframe.

🧰 Tools
🪛 yamllint

[error] 48-48: no new line character at the end of file

(new-line-at-end-of-file)

src/MEDS_DEV/tasks/criteria/abnormal_lab/in_icu/blood_chemistry/elevated_creatinine/first_24h.yaml (3)

23-23: LGTM: Trigger correctly set to ICU admission.

The trigger is appropriately set to "icu_admission", which aligns with the task's purpose of predicting elevated creatinine levels after ICU admission.


1-48: Overall assessment: Well-structured task definition with minor improvements needed.

The task definition for predicting elevated creatinine levels in ICU patients is well-structured and aligns with the PR objectives. The suggested improvements, including adding specific thresholds, implementing undefined predicates, and refining the windows section, will enhance the clarity and consistency of the task definition. Once these minor issues are addressed, the file will provide a robust foundation for the abnormal lab prediction task.

🧰 Tools
🪛 yamllint

[error] 48-48: no new line character at the end of file

(new-line-at-end-of-file)


7-21: ⚠️ Potential issue

Implement or provide references for undefined predicates.

Several predicates (icu_admission, icu_discharge, hospital_discharge, death, creatinine, abnormally_high_creatinine) are currently undefined (marked with ???). To ensure the task functions correctly:

  1. Implement these predicates directly in this file, or
  2. Provide references to where these predicates are defined if they are implemented elsewhere in the codebase.

Additionally, for the abnormally_high_creatinine predicate, consider setting the value_min to 1.3 mg/dL to align with the PR objectives.

To verify the existence of predicate definitions elsewhere in the codebase, you can run the following script:

✅ Verification successful

Undefined predicates are defined in predicates.yaml.

All predicates (icu_admission, icu_discharge, hospital_discharge, death, creatinine, abnormally_high_creatinine) are defined in src/MEDS_DEV/datasets/MIMIC-IV/predicates.yaml. Ensure that first_24h.yaml properly references these predicates from predicates.yaml to maintain clarity and functionality.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Search for predicate definitions in the codebase

echo "Searching for predicate definitions..."
rg --type yaml -g '!src/MEDS_DEV/tasks/criteria/abnormal_lab/in_icu/blood_chemistry/elevated_creatinine/first_24h.yaml' \
   -e 'icu_admission:' -e 'icu_discharge:' -e 'hospital_discharge:' -e 'death:' -e 'creatinine:' -e 'abnormally_high_creatinine:'

echo "If no results are found, the predicates may need to be implemented in this file."

Length of output: 6008

src/MEDS_DEV/tasks/criteria/abnormal_lab/in_icu/cbc/anemia/first_24h.yaml (1)

23-23: LGTM: Trigger correctly set.

The trigger is appropriately set to "icu_admission", which aligns with the task description and PR objectives.

src/MEDS_DEV/tasks/criteria/abnormal_lab/in_icu/cbc/thrombocytopenia/first_24h.yaml (2)

1-5: LGTM: Clear and comprehensive task description.

The description accurately defines the task for predicting thrombocytopenia, including the timeframe and inclusion criteria. It aligns well with the PR objectives.


23-23: LGTM: Appropriate trigger defined.

The trigger is correctly set to 'icu_admission', which aligns with the task description and PR objectives.

src/MEDS_DEV/tasks/criteria/abnormal_lab/in_icu/blood_chemistry/metabolic_acidosis/first_24h.yaml (2)

1-5: LGTM: Clear and comprehensive task description.

The description effectively outlines the task's purpose, prediction target, time frames, and inclusion criteria. It provides a solid foundation for understanding the task's objectives and constraints.


23-23: LGTM: Appropriate trigger defined.

The trigger is correctly set to 'icu_admission', which aligns with the task description and ensures that the prediction process starts at the right moment - when a patient is admitted to the ICU.

Copy link
Owner

@mmcdermott mmcdermott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the ranges not be set in the task configs for particular units?

Also, do you want to add the predicates you've defined here to the MIMIC predicates.yaml file too?

Also, should we have something different than cbc and blood_chemistry? blood_chemistry seems very broad relative to cbc, but maybe that is wrong?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we should have the ICU stay be at least 48 hours, right? We want the lab test to happen while they are still in the ICU, not after they were discharged, don't we?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This applies here and throughout.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, need to modify no_discharge window to cover 48h.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, we would expect the labs to be recorded during patient stay at ICU.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to predict abnormal creatine levels on all patients? Or only on patients who have not yet had an abnormal creatine in the ICU? And/or who haven't had an abnormal creatine in this hospital stay? Or something else guided by clinical ideas? @Gallifantjack, any suggestions?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This applies here and throughout.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just defined it as the first case where we predict on all patients, but I agree that this needs an opinion from clinicians to figure out which scenario has the best clinical utility.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @Jwoo5 mentioned, we were considering detecting any event of elevated creatinine, independent of previous instances of creatinine elevation. What you mentioned, @mmcdermott, concerns the problem of selecting the cohort we want to use (to condition the task on), which should be orthogonal to the task definition.

Copy link
Owner

@mmcdermott mmcdermott Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually @mandiehyewon , I think that task cohort is very much a key part of task definition -- at least as I'm using the term. we'll care about how models perform differently on different cohorts even for the same "label", and I think our goal in specifying these ACES configs should not just be to say "here is the label" but also to say "this task is interested in this label over patients who satisfy these criteria"
So if we think it would be relevant to get more specific here, we should, in my opinion. Does that make sense?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good points; I agree with Matthew's suggestion to make it more restricted. Nice start might be to predict abnormal creatinine levels in all patients who haven't previously shown elevated levels, without restricting to ICU patients.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense @mmcdermott as we are using ACES format and defining cohort w.r.t. each task. About your point @Gallifantjack, just to clarify, you meant those who haven't previously shown elevated levels during current admission, correct? I think it makes sense to detect the first event in each admission but don't think it's necessary to extend to historic admissions.

code: ???
abnormally_high_creatinine:
code: ???
value_min: ???
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this reference range different for men vs. women, or children vs. adults? Do we need more qualifiers and/or different ranges for different patients here? Tagging @Gallifantjack for the clinical questions here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the reference ranges were based on @mandiehyewon's suggestions, and only "Anemia" prediction task based on hemoglobin has a different range for men vs women. I'm also not sure if ACES does have the capacity to define predicates conditioned on other information (e.g., age, gender, pregnancy, etc).

Copy link
Collaborator

@mandiehyewon mandiehyewon Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point about this @mmcdermott. While we can define cutoffs for elevated SCr, these values can be somewhat arbitrary and may vary between institutions. It's important to refer to established literature for guidance. According to this source, the normal SCr levels for individuals with normal kidney function are:

  • Men: 0.6 to 1.2 mg/dL
  • Women: 0.5 to 1.1 mg/dL

The source also provides approximate measures of glomerular filtration rate (GFR) based on creatinine levels:

  • Creatinine 1 mg/dL: Baseline for a patient with normal GFR
  • Creatinine 2 mg/dL: Indicates about a 50% reduction in GFR
  • Creatinine 4 mg/dL: Indicates about a 70% to 85% reduction in GFR
  • Creatinine 8 mg/dL: Indicates about a 90% to 95% reduction in GFR

From this information, we can define multiple cutoffs for elevated creatinine levels. For instance, we might use 1.3 mg/dL—the higher end of the normal range for both men and women—as a threshold. The choice of cutoff depends on the specific condition we aim to detect:

  • Critical Conditions (e.g., Creatinine ≥ 2.0 mg/dL): Levels that may require immediate medical intervention due to significant kidney function reduction.

  • Mild Elevations (e.g., Creatinine ≥ 1.3 mg/dL): Levels that warrant further evaluation but may not indicate critical kidney failure.

Ultimately, the selected cutoff should align with the clinical context and the goals of the evaluation. Do we want to identify patients needing urgent care, or are we focusing on early detection of potential kidney issues that require monitoring?

For pediatric patients, different reference ranges are necessary for all laboratory values. Therefore, it's best to focus on adult patients (age > 18) for the tasks defined here.

Copy link
Collaborator

@mandiehyewon mandiehyewon Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is example of patient with mild SCr increase (with the cutoff 1.3 mg/dL)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mandiehyewon -- this is super helpful. @justin13601, what is the status of our ability to define different reference ranges for different patients based on, e.g., static data? I think it is currently not possible; is that right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think currently it is unsupported, but possibly doable by allowing users to create derived predicates joining normal plain predicates (ie. reference ranges) and a static predicates (ie. sex, which are also plain predicates under the hood)? Perhaps you would specify both male and female in patient_demographics so both are included, but then join it with different plain predicates - not sure if this works but I can look into it if this is of priority.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justin13601, can users currently create derived predicates with normal plain and static? Or would we need to add that? It seems passably simple to do so, if it isn't supported already.

Copy link
Collaborator

@justin13601 justin13601 Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justin13601, can users currently create derived predicates with normal plain and static? Or would we need to add that? It seems passably simple to do so, if it isn't supported already.

I just tested, seems like ACES can parse if users define derived predicates between static and normal plain predicates, but doesn't yet evaluate them properly. Our current derived predicate logic simply looks at whether all of the referenced predicates are realized in a given row, but static predicates are defined in a separate row with a null timestamp. I suppose we can forward fill the value within the subject_id (since they're currently all 0 anyway). Or update the logic such that in cases when a static variable is involved in a derived predicate, always look to the row with the null timestamp?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, you're right. Can you file two github issues on ACES: 1) to disable the ability to create such derived predicates until it works properly and 2) to actually make it work properly and link them here? then if we can implement them in ACES fast enough we can use them for any labs where it is appropriate here.

abnormally_low_sodium:
code: ???
value_min: null
value_max: ???
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this reference range different for men vs. women, or children vs. adults? Do we need more qualifiers and/or different ranges for different patients here? Tagging @Gallifantjack for the clinical questions here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same with the above.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hyponatremia has a standardized cutoff that applies equally to both men, women, and neonate/pediatric patients (< 135 mEq/L). As mentioned above, it's best to focus on adult patients (age > 18) for the tasks defined here for now.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mandiehyewon, I agree with the focus on adult patients -- but we should therefore encode that specifically in the inclusion criteria here, right? @justin13601, do we have any ability to filter by age or by duration of time of a window?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would age be represented in MEDS? Would it simply be subtracting the timestamp at a given event by the DOB? I wonder if one can also create a window in the task requiring an event as an adult (ie. 18 years after DOB or something like that)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least for the MEDS_Extract_v0.0.7 version, one's birth date is represented as an event whose code is "MEDS_BIRTH" with a timestamp of the corresponding DOB, so we should calculate age dynamically for every triggered sample and filter it out if its age is less than 18 years by a window or something as @justin13601 mentioned.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just do a time-derived window of the target age as @justin13601 suggests, and we can make it more elegant later. @Jwoo5, do you have a good sense of how to do that? We can just do a "no_children" window that ends at trigger and starts at end - 18y and has no birth events which are defined using the MEDS_BIRTH code. That should work, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tested it for in-icu mortality task, but it turns out that there is no difference in the cohort numbers:

  no_children:
    start: end - 6570d # 18 years = 365 * 18 days
    end: trigger
    start_inclusive: True
    end_inclusive: True
    has:
      birth: (None, 0)

and I checked that all the subjects from in-icu mortality (based on icu admission) and long-los (based on hosp admission) cohorts are of age > 18.
It seems like MEDS_Extract_v0.0.7 has already filtered out children patients, could you confirm it @mmcdermott?

Copy link
Collaborator

@mandiehyewon mandiehyewon Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delayed confirmation, @mmcdermott. That's correct, we would like to specify the cohort to be adults if we want to use this cutoff!

@mmcdermott
Copy link
Owner

Also @Jwoo5, can you get the PR check to pass? If you want, I can help you set things up locally to run those and auto-fix issues every commit.

@Jwoo5
Copy link
Contributor Author

Jwoo5 commented Oct 22, 2024

For your questions on the head:

  1. Can the ranges not be set in the task configs for particular units?
    Did you mean to ask if we can set the ranges to exclude specific items that have particular units? Please correct me if I misunderstood your question.

  2. Also, do you want to add the predicates you've defined here to the MIMIC predicates.yaml file too?
    Yes, I think we need to add the associated predicates to realize these tasks on MIMIC-IV as I shared with in the task channel by GH gists.

  3. Also, should we have something different than cbc and blood_chemistry? blood_chemistry seems very broad relative to cbc, but maybe that is wrong?
    They were also defined based on @mandiehyewon's suggestions. Maybe this also needs an opinion from @Gallifantjack or @mandiehyewon.

Also, actually I am not that familiar with these PR checks, so it would be great if you can help to resolve it. I don't know why the PR check failed with these simple additions of .yaml files :( I updated the .yaml files to contain a new line at the end of file and confirmed it passes the PR check!

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Outside diff range and nitpick comments (6)
src/MEDS_DEV/tasks/criteria/abnormal_lab/in_icu/vital/hypotension/first_24h.yaml (1)

1-5: Add the specific threshold for hypotension in the description.

Consider adding the specific MAP threshold (< 65 mmHg) in the description to align with standard medical definitions of hypotension. This would make the task criteria immediately clear to readers.

src/MEDS_DEV/tasks/criteria/abnormal_lab/in_icu/cbc/leukocytosis/first_24h.yaml (1)

1-6: Enhance description with specific thresholds and units.

The description should include the specific threshold (>11,000/uL) that defines leukocytosis, as mentioned in the PR objectives. This helps readers understand the exact criteria without having to look elsewhere.

 description: >-
   This task predicts whether the patient will have high _WBC_ values (leukocytosis) within the next 24
   hours, given the first 24 hours of their admission to the ICU. To be included, patients must have
   at least one measurement for _WBC_ within the next 24 hours after their admission to ICU,
-  and their length of stay in ICU should be at least 24 hours.
+  and their length of stay in ICU should be at least 24 hours. Leukocytosis is defined as
+  WBC count exceeding 11,000/uL.
src/MEDS_DEV/tasks/criteria/abnormal_lab/in_icu/cbc/anemia/first_24h.yaml (1)

1-6: Enhance description with specific threshold value.

Consider adding the specific hemoglobin threshold (13 g/dL) to the description for immediate clarity about what constitutes "low hemoglobin values".

 description: >-
   This task predicts whether the patient will have low _hemoglobin_ values (anemia) within the next 24
   hours, given the first 24 hours of their admission to the ICU. To be included, patients must have
   at least one measurement for _hemoglobin_ within the next 24 hours after their admission to ICU,
-  and their length of stay in ICU should be at least 24 hours.
+  and their length of stay in ICU should be at least 24 hours. Anemia is defined as hemoglobin
+  levels below 13 g/dL.
src/MEDS_DEV/tasks/criteria/abnormal_lab/in_icu/cbc/thrombocytopenia/first_24h.yaml (2)

1-5: Add the specific threshold for thrombocytopenia in the description.

The description should include the specific threshold value (platelet counts below 150,000/uL) to make it immediately clear what constitutes "low platelets values".

 description: >-
   This task predicts whether the patient will have low _platelets_ values (thrombocytopenia) within
   the next 24 hours, given the first 24 hours of their admission to the ICU. To be included,
   patients must have at least one measurement for _platelets_ within the next 24 hours after their
-  admission to ICU, and their length of stay in ICU should be at least 24 hours.
+  admission to ICU, and their length of stay in ICU should be at least 24 hours. Thrombocytopenia
+  is defined as platelet counts below 150,000/uL.

33-40: Clarify the no_discharge window end time calculation.

The end time calculation uses start + 48h which could be ambiguous. For clarity and consistency with other window definitions, consider using trigger + 48h.

   no_discharge:
     start: trigger
-    end: start + 48h
+    end: trigger + 48h
     start_inclusive: False
     end_inclusive: True
     has:
       icu_admission: (None, 0)
       discharge_or_death: (None, 0)
src/MEDS_DEV/tasks/criteria/abnormal_lab/in_icu/blood_chemistry/metabolic_acidosis/first_24h.yaml (1)

1-5: Add the specific threshold value in the description.

Consider adding the specific threshold value (22 mEq/L) that defines metabolic acidosis to make the task criteria immediately clear to readers.

 description: >-
   This task predicts whether the patient will have low _bicarbonate_ values (metabolic acidosis)
   within the next 24 hours, given the first 24 hours of their admission to the ICU. To be included,
   patients must have at least one measurement for _bicarbonate_ within the next 24 hours after
-  their admission to ICU, and their length of stay in ICU should be at least 24 hours.
+  their admission to ICU, and their length of stay in ICU should be at least 24 hours. Metabolic
+  acidosis is defined as bicarbonate levels below 22 mEq/L.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between daadf9d and ab993e8.

📒 Files selected for processing (7)
  • src/MEDS_DEV/tasks/criteria/abnormal_lab/in_icu/blood_chemistry/elevated_creatinine/first_24h.yaml (1 hunks)
  • src/MEDS_DEV/tasks/criteria/abnormal_lab/in_icu/blood_chemistry/hyponatremia/first_24h.yaml (1 hunks)
  • src/MEDS_DEV/tasks/criteria/abnormal_lab/in_icu/blood_chemistry/metabolic_acidosis/first_24h.yaml (1 hunks)
  • src/MEDS_DEV/tasks/criteria/abnormal_lab/in_icu/cbc/anemia/first_24h.yaml (1 hunks)
  • src/MEDS_DEV/tasks/criteria/abnormal_lab/in_icu/cbc/leukocytosis/first_24h.yaml (1 hunks)
  • src/MEDS_DEV/tasks/criteria/abnormal_lab/in_icu/cbc/thrombocytopenia/first_24h.yaml (1 hunks)
  • src/MEDS_DEV/tasks/criteria/abnormal_lab/in_icu/vital/hypotension/first_24h.yaml (1 hunks)
🧰 Additional context used
🔇 Additional comments (17)
src/MEDS_DEV/tasks/criteria/abnormal_lab/in_icu/vital/hypotension/first_24h.yaml (2)

33-40: Verify the no_discharge window constraints.

The current configuration prevents any ICU admission or discharge/death events during the entire 48-hour period. This might be too restrictive for patients who:

  1. Are transferred between ICUs
  2. Have temporary ICU stays recorded during the period

Consider whether these scenarios should be handled differently.

Let's check for similar constraints in other tasks:

✅ Verification successful

Let me fix the regex pattern and try again to search for similar window definitions.


Based on the search results, I can now provide a final response:

The no_discharge window constraints are consistently implemented across similar tasks

The search reveals that this exact window configuration is used consistently across multiple ICU-related tasks in the codebase, including:

  • Anemia detection
  • Metabolic acidosis
  • Thrombocytopenia
  • Hypotension
  • Elevated creatinine
  • Hyponatremia
  • Leukocytosis

All these tasks use identical window constraints:

  • 48-hour period
  • No ICU admission events: icu_admission: (None, 0)
  • No discharge/death events: discharge_or_death: (None, 0)

This consistent pattern across multiple critical care scenarios suggests this is an intentional design choice for first 24-hour assessments, likely to ensure clean observation periods without administrative transitions that could affect data quality.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for no_discharge window definitions in other tasks
echo "Searching for no_discharge window definitions..."
rg -t yaml "no_discharge:.*(?:\n.*)+?(icu_admission|discharge_or_death):"

Length of output: 515


Script:

#!/bin/bash
# Search for no_discharge window definitions in other tasks
echo "Searching for no_discharge window definitions..."
rg -U "no_discharge:\s*\n.*(?:\n.*)*?(?:icu_admission|discharge_or_death):" -g "*.yaml"

# Also search for other ICU-related window definitions to understand the patterns
echo -e "\nSearching for other ICU-related window definitions..."
rg -U "window:.*\n.*(?:\n.*)*?icu_admission:" -g "*.yaml"

Length of output: 6524


7-21: Consider using shared predicate definitions.

The predicates like icu_admission, icu_discharge, etc., are likely to be reused across multiple tasks. Consider moving these common predicates to a shared configuration file to maintain consistency and reduce duplication.

Let's check if these predicates are defined elsewhere:

src/MEDS_DEV/tasks/criteria/abnormal_lab/in_icu/blood_chemistry/hyponatremia/first_24h.yaml (4)

1-6: LGTM! Clear and comprehensive description.

The description effectively communicates the task's purpose, prediction target, and inclusion criteria.


23-23: LGTM! Appropriate trigger definition.

The trigger correctly uses ICU admission as the starting point for prediction.


25-48: LGTM! Well-structured window definitions.

The windows configuration effectively implements:

  • 24-hour input window from ICU admission
  • Discharge prevention during the entire prediction period
  • Appropriate target window with required measurements

15-21: Awaiting clinical input for sodium reference ranges.

The threshold for hyponatremia needs clinical validation:

  1. Potential variations based on patient demographics (age, gender)
  2. Consideration of different clinical contexts
  3. Validation of the 135 mEq/L threshold mentioned in PR objectives

Let's check if there are any existing reference range definitions in the codebase:

src/MEDS_DEV/tasks/criteria/abnormal_lab/in_icu/cbc/leukocytosis/first_24h.yaml (2)

12-13: LGTM! Well-defined composite predicate.

The discharge_or_death predicate correctly combines all relevant discharge and death events.


41-48: LGTM! Well-structured target window configuration.

The target window is correctly configured with:

  • Proper timing (24h after input window)
  • Required WBC measurement constraint
  • Appropriate labeling using the abnormally_high_wbc predicate
src/MEDS_DEV/tasks/criteria/abnormal_lab/in_icu/blood_chemistry/elevated_creatinine/first_24h.yaml (4)

32-41: Verify window configuration for discharge edge cases.

The no_discharge window ends at 48 hours, but the target window extends up to 48-72 hours from admission. This creates a potential edge case where a patient could be discharged after the no_discharge window ends but before the target window completes, potentially affecting measurement availability.

Consider extending the no_discharge window to cover the entire target window period.

#!/bin/bash
# Check if other similar tasks handle this edge case differently
rg -g '*.yaml' -B 5 -A 5 'no_discharge:'

16-21: Consider gender and age-specific reference ranges for creatinine.

The current implementation uses a single threshold for abnormal creatinine levels. However, normal creatinine ranges can vary by:

  • Gender (typically higher in males)
  • Age (different ranges for children vs. adults)
  • Other factors like muscle mass and diet

Consider implementing conditional thresholds based on these factors if the system supports such functionality.

#!/bin/bash
# Check if any other tasks implement gender/age-specific ranges
rg -g '*.yaml' 'value_min.*gender|value_min.*age'

41-48: Clarify handling of multiple creatinine measurements.

The target window configuration requires at least one creatinine measurement but doesn't specify how multiple measurements should be handled. Consider documenting:

  • Whether the highest value should be used
  • If multiple confirmatory readings are required
  • How to handle conflicting measurements (normal vs. abnormal) within the window
#!/bin/bash
# Check how other tasks handle multiple measurements
rg -g '*.yaml' -B 5 -A 5 'label:.*creatinine'

7-14: ⚠️ Potential issue

Define the core predicates marked with ???.

The following critical predicates are undefined:

  • icu_admission
  • icu_discharge
  • hospital_discharge
  • death

These predicates are essential for the task's functionality and should be properly defined.

✅ Verification successful

Let me try another search strategy to verify if these predicates are defined elsewhere.

The previous search might have failed due to the specific pattern. Let's try a broader search to understand the predicate definitions in the codebase and particularly look in common/shared directories where such core predicates might be defined.


Let me try one final search to look for any predicate definitions in potential base or template files.

This should help us find if there are any base templates or actual implementations of these predicates in the codebase.


Predicates are defined in MIMIC-IV dataset configuration

The predicates marked with ??? in your task configuration are actually defined in src/MEDS_DEV/datasets/MIMIC-IV/predicates.yaml with the following implementations:

  • icu_admission: code: { regex: "^ICU_ADMISSION//.*" }
  • icu_discharge: code: { regex: "^ICU_DISCHARGE//.*" }
  • hospital_discharge: code: { regex: "^HOSPITAL_DISCHARGE//.*" }
  • death: code: MEDS_DEATH

These predicates are likely meant to be inherited from the dataset configuration, which explains why they appear as ??? in your task-specific YAML. This is a common pattern where core predicates are defined at the dataset level and reused across multiple tasks.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if these predicates are defined elsewhere in the codebase
rg -g '*.yaml' -A 5 'icu_admission:|icu_discharge:|hospital_discharge:|death:'

Length of output: 983


Script:

#!/bin/bash
# Search for any YAML files that might contain predicate definitions
fd -e yaml -x cat {} | grep -A 5 "predicates:"

# Also search specifically in the common/shared directories for predicate definitions
fd -e yaml -g '**/common/**' -g '**/shared/**' -x cat {}

Length of output: 1706


Script:

#!/bin/bash
# Search for base/template files that might contain predicate definitions
fd -e yaml -g '*base*' -g '*template*' -x cat {}

# Also search for files that define these specific predicates with their actual implementations
rg -g '*.yaml' 'code:.*HOSPITAL_|code:.*ICU_|code:.*DEATH' -A 2

Length of output: 1004

src/MEDS_DEV/tasks/criteria/abnormal_lab/in_icu/cbc/anemia/first_24h.yaml (2)

25-48: Verify window references and temporal constraints.

The window configurations look correct, but please ensure:

  1. The reference input.end in the target window is properly resolved
  2. The expression start + 48h in no_discharge window correctly spans both input and target windows
  3. The constraint (None, 0) for discharge events effectively prevents early discharges
✅ Verification successful

Based on the search results, I can now provide a final response since I have sufficient evidence to verify the window configurations:

Window configuration is correctly implemented and consistent with other ICU-based tasks

The window configuration follows the established pattern used across multiple ICU-based tasks in the codebase, specifically:

  1. The reference input.end in the target window is valid and consistently used across similar tasks
  2. The start + 48h duration in the no_discharge window is correct as it matches the pattern used in other ICU tasks (e.g., hyponatremia, hypotension, elevated_creatinine) to cover both input and target windows
  3. The constraint (None, 0) for discharge events is properly implemented and consistent with other ICU tasks to prevent early discharges

The implementation aligns with similar tasks in:

  • src/MEDS_DEV/tasks/criteria/abnormal_lab/in_icu/blood_chemistry/hyponatremia/first_24h.yaml
  • src/MEDS_DEV/tasks/criteria/abnormal_lab/in_icu/vital/hypotension/first_24h.yaml
  • And other ICU-based tasks following the same pattern
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for similar window configurations in other tasks

# Test: Search for similar window configurations to validate consistency
echo "Searching for similar window configurations..."
rg -t yaml "windows:.*(?:\n.*)+?input:.*(?:\n.*)+?target:" --multiline

Length of output: 21356


7-22: Verify predicate definitions in the codebase.

The composite predicate discharge_or_death looks correct, but let's verify if the required predicates are defined elsewhere in the codebase.

✅ Verification successful

Based on the search results, I can now generate the final response:

Required predicates are defined in MIMIC-IV dataset configuration

The base predicates (icu_admission, icu_discharge, hospital_discharge, death) are defined in src/MEDS_DEV/datasets/MIMIC-IV/predicates.yaml with proper code regex patterns:

  • icu_admission: ^ICU_ADMISSION//.*
  • icu_discharge: ^ICU_DISCHARGE//.*
  • hospital_discharge: ^HOSPITAL_DISCHARGE//.*
  • death: (defined but pattern not shown in search results)

The composite predicate discharge_or_death correctly uses these base predicates in its expression.

The lab-specific predicates (hemoglobin, abnormally_low_hemoglobin) are intentionally marked with ??? as they need to be configured per dataset, which is the expected pattern seen across other similar lab criteria files (e.g., platelets, wbc, sodium, etc.).

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if predicates are defined in other YAML files

# Test 1: Search for predicate definitions in YAML files
echo "Searching for predicate definitions..."
rg -t yaml "predicates:.*(?:\n.*)+?(icu_admission|icu_discharge|hospital_discharge|death|hemoglobin):" --multiline

# Test 2: Search for hemoglobin-related configurations
echo "Searching for hemoglobin configurations..."
rg -t yaml "hemoglobin:.*(?:\n.*)+?code:" --multiline

Length of output: 30564

src/MEDS_DEV/tasks/criteria/abnormal_lab/in_icu/cbc/thrombocytopenia/first_24h.yaml (1)

12-13: Verify the discharge_or_death predicate logic.

The composite predicate includes hospital_discharge which might be redundant since an ICU discharge would typically precede or coincide with a hospital discharge. This could lead to duplicate event handling.

✅ Verification successful

Let me analyze the results and generate a final response.

The discharge_or_death predicate implementation is consistent across the codebase

The search results show that the same discharge_or_death predicate expression or(icu_discharge, death, hospital_discharge) is consistently used across multiple task criteria files in the ICU context. This consistent implementation across different medical conditions (thrombocytopenia, leukocytosis, metabolic acidosis, etc.) suggests this is an intentional design pattern rather than an oversight.

While it might seem redundant at first glance, including both icu_discharge and hospital_discharge likely serves specific use cases:

  • Capturing direct transfers from ICU to other facilities
  • Handling cases where hospital discharge data might be available even when ICU discharge data is missing
  • Ensuring completeness in different data collection scenarios
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for other discharge_or_death predicates to verify consistency
rg -l "discharge_or_death" | xargs rg "expr: or\(.*discharge.*\)"

Length of output: 1173

src/MEDS_DEV/tasks/criteria/abnormal_lab/in_icu/blood_chemistry/metabolic_acidosis/first_24h.yaml (2)

12-13: LGTM! Well-structured composite predicate.

The discharge_or_death predicate correctly combines all relevant discharge/death events.


26-32: LGTM! Well-structured input and target windows.

The window configurations correctly implement the task requirements:

  • Input window captures first 24h of ICU stay
  • Target window requires at least one bicarbonate measurement
  • Proper inclusive/exclusive bounds are set

Also applies to: 41-48

@mandiehyewon
Copy link
Collaborator

I have left some comment with some clinical insights.

@mmcdermott
Copy link
Owner

  1. Can the ranges not be set in the task configs for particular units?
    Did you mean to ask if we can set the ranges to exclude specific items that have particular units? Please correct me if I misunderstood your question.

Yes, but I meant more of, can we define the value_min or value_max as needed in the config file and just indicate what unit it is from, so that we don't risk different users defining different ranges?

  1. Also, do you want to add the predicates you've defined here to the MIMIC predicates.yaml file too?
    Yes, I think we need to add the associated predicates to realize these tasks on MIMIC-IV as I shared with in the task channel by GH gists.

sounds perfect

  1. Also, should we have something different than cbc and blood_chemistry? blood_chemistry seems very broad relative to cbc, but maybe that is wrong?
    They were also defined based on @mandiehyewon's suggestions. Maybe this also needs an opinion from @Gallifantjack or @mandiehyewon.

@mandiehyewon, what do you think? Should we group them by typical lab panel name? By organ system? By just the name of the test? I'm not sure what would be best.

Also, actually I am not that familiar with these PR checks, so it would be great if you can help to resolve it. I don't know why the PR check failed with these simple additions of .yaml files :( I updated the .yaml files to contain a new line at the end of file and confirmed it passes the PR check!

These checks help ensure all the code is in a consistent style; if you install pre-commit in your local enviroment it will automatically run and correct issues on each commit. But it looks like in this case you got them all passing, so it is not critical. Thanks!

@Jwoo5
Copy link
Contributor Author

Jwoo5 commented Oct 23, 2024

Yes, but I meant more of, can we define the value_min or value_max as needed in the config file and just indicate what unit it is from, so that we don't risk different users defining different ranges?

I am not sure if we can do it explicitly. Maybe we can provide value_min or value_max with a comment to indicate units like:

  abnormally_low_sodium:
    code: ???
    value_min: null
    value_max: 135 # mEq/L

@mmcdermott
Copy link
Owner

@Jwoo5 and @mandiehyewon -- what is the status of this PR; do we want to make other changes to the ranges, etc.? Should we close this PR temporarily until those are made, or is it ready to go in now?

@Jwoo5
Copy link
Contributor Author

Jwoo5 commented Oct 27, 2024

To-dos for this PR would be:

  • Re-define task definitions to be more restricted as @Gallifantjack suggested: predict abnormal lab levels in "all patients" (including non-ICU patients) who haven't previously shown abnormal levels on the target lab.
    • Restrict to the patients whose hospital stay is at least 48 hours as well.
  • Include only adult patients (age > 18) by explicitly defining "no_children" window. It will not affect the cohorts from MEDS_Extract_v0.0.7 since this ETL seems already filtering out children patients, though.
  • Provide reference ranges along with units in the criteria configs.
  • Add the associated predicates to realize these tasks on MIMIC-IV.

If this list looks fine, I'll get to update this PR soon.
Do these to-dos seem reasonable, or is there anything that I've missed from the discussions above, @mmcdermott?

@justin13601
Copy link
Collaborator

  • Provide reference ranges along with units in the criteria configs.

@Jwoo5 If this is where different reference ranges are needed based on static variables (like sex etc.), it should be supported on this branch: https://github.com/justin13601/ACES/tree/25_medsdev_static_derived_predicates (see justin13601/ACES#145)

@Jwoo5
Copy link
Contributor Author

Jwoo5 commented Oct 28, 2024

@justin13601 Thanks!
I think we can use standardized ranges focusing on adult patients for the first pass, though. What do you think, @mandiehyewon? If we should define different ranges based on gender or age for some tasks, could you provide the reference cutoffs so that I can reflect them to the task configs?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Outside diff range and nitpick comments (12)
src/MEDS_DEV/tasks/criteria/abnormal_lab/vital/hypotension/first_24h.yaml (2)

1-6: Consider enhancing the description with the MAP threshold.

The description clearly outlines the task purpose and inclusion criteria. Consider adding the specific MAP threshold (< 65 mmHg) to make the definition more explicit and self-contained.

 description: >-
   This task predicts whether the patient will have low _MAP_ values (hypotension) within the next
   24 hours, given the first 24 hours of their admission to the hospital. To be included,
   1) patients must have at least one measurement for _MAP_ within the next 24 hours after their
   admission to the hospital, 2) patients must not have shown low _MAP_ values previously, and
-  3) their length of stay in the hospital must be at least 48 hours.
+  3) their length of stay in the hospital must be at least 48 hours. Low _MAP_ is defined as
+  values below 65 mmHg.

47-53: Consider more precise age calculation.

The 18-year calculation using 6570 days (365 * 18) doesn't account for leap years. For more precision, consider using 6574 days (accounts for ~4.5 leap years in 18 years).

-    start: end - 6570d # 18 years = 365 * 18 days
+    start: end - 6574d # 18 years including leap years
src/MEDS_DEV/tasks/criteria/abnormal_lab/cbc/leukocytosis/first_24h.yaml (1)

1-6: Consider enhancing the description with specific threshold values.

The description could be more precise by including the actual threshold for high WBC values (>11 K/uL) that's defined in the predicates section.

 description: >-
   This task predicts whether the patient will have high _WBC_ values (leukocytosis) within the next
   24 hours, given the first 24 hours of their admission to the _hospital_. To be included,
   1) patients must have at least one measurement for _WBC_ within the next 24 hours after their
-  admission to the hospital, 2) patients must not have shown high _WBC_ values previously, and
+  admission to the hospital, 2) patients must not have shown high _WBC_ values (>11 K/uL) previously, and
   3) their length of stay in the hospital must be at least 48 hours.
src/MEDS_DEV/tasks/criteria/abnormal_lab/blood_chemistry/hyponatremia/first_24h.yaml (1)

1-6: Remove markdown-style emphasis from the description.

The description uses markdown-style emphasis with underscores (e.g., _sodium_, _hospital_), which is unnecessary in YAML strings and might not render as expected.

-  This task predicts whether the patient will have low _sodium_ values (hyponatremia) within the
-  next 24 hours, given the first 24 hours of their admission to the _hospital_. To be included,
-  1) patients must have at least one measurement for _sodium_ within the next 24 hours after their
-  admission to the hospital, 2) patients must not have shown low _sodium_ values previously, and
+  This task predicts whether the patient will have low sodium values (hyponatremia) within the
+  next 24 hours, given the first 24 hours of their admission to the hospital. To be included,
+  1) patients must have at least one measurement for sodium within the next 24 hours after their
+  admission to the hospital, 2) patients must not have shown low sodium values previously, and
src/MEDS_DEV/tasks/criteria/abnormal_lab/cbc/anemia/first_24h.yaml (2)

1-6: Enhance description clarity and consistency.

Consider these improvements to the description:

  1. Use consistent emphasis for medical terms (e.g., _hospital_, _hemoglobin_)
  2. Consider adding the specific medical threshold (13 g/dL) in the description for clarity
 description: >-
   This task predicts whether the patient will have low _hemoglobin_ values (anemia) within the next
-  24 hours, given the first 24 hours of their admission to the _hospital_. To be included,
+  24 hours, given the first 24 hours of their admission to the _hospital_. Anemia is defined as
+  _hemoglobin_ levels below 13 g/dL. To be included,
   1) patients must have at least one measurement for _hemoglobin_ within the next 24 hours after
   their admission to the hospital, 2) patients must not have shown low _hemoglobin_ values
   previously, and 3) their length of stay in the hospital must be at least 48 hours.

47-54: Review age calculation method.

The age calculation using 6570d (18 years * 365 days) doesn't account for leap years, which could affect edge cases.

Consider using a more precise calculation or documenting the simplification choice:

   no_children:
-    start: end - 6570d # 18 years = 365 * 18 days
+    start: end - 6575d # 18 years accounting for leap years (365.25 * 18)
     end: trigger
src/MEDS_DEV/tasks/criteria/abnormal_lab/cbc/thrombocytopenia/first_24h.yaml (2)

1-6: Enhance description with specific threshold values.

Consider adding the specific threshold value (150 K/uL) in the description for clarity and immediate understanding of what constitutes "low platelets values".

 description: >-
   This task predicts whether the patient will have low _platelets_ values (thrombocytopenia) within
   the next 24 hours, given the first 24 hours of their admission to the hospital. To be included,
   1) patients must have at least one measurement for _platelets_ within the next 24 hours after
-  their admission to the hospital, 2) patients must not have shown low _platelets_ values
+  their admission to the hospital, 2) patients must not have shown low _platelets_ values (< 150 K/uL)
   previously, and 3) their length of stay in the hospital must be at least 48 hours.

48-49: Verify age calculation for pediatric exclusion.

The calculation 6570d for 18 years doesn't account for leap years. A more accurate calculation would be 6574d (18 * 365.25 days).

-    start: end - 6570d # 18 years = 365 * 18 days
+    start: end - 6574d # 18 years = 365.25 * 18 days
src/MEDS_DEV/tasks/criteria/abnormal_lab/blood_chemistry/elevated_creatinine/first_24h.yaml (1)

1-6: Consider standardizing markdown emphasis formatting.

While the description is clear and comprehensive, consider using consistent markdown formatting. The terms creatinine and hospital are emphasized, but this isn't maintained throughout (e.g., "admission to the hospital" vs "admission to the hospital").

src/MEDS_DEV/tasks/criteria/abnormal_lab/blood_chemistry/metabolic_acidosis/first_24h.yaml (2)

1-6: Consider enhancing the description with the threshold value.

The description clearly outlines the purpose and inclusion criteria. Consider adding the specific threshold value (22 mEq/L) in the description to make it immediately clear what constitutes "low bicarbonate values".


28-34: Clarify the input window start time.

The input window's start time is set to null. Consider explicitly setting this to trigger for better clarity, since the description indicates we're looking at the first 24 hours from admission.

src/MEDS_DEV/datasets/MIMIC-IV/predicates.yaml (1)

22-167: Consider adding upper bounds for critical values.

While the current ranges correctly implement the PR objectives, consider adding upper bounds for critical values that could indicate medical emergencies. For example:

  • Creatinine > 3.0 mg/dL might indicate severe kidney injury
  • Sodium < 120 mEq/L (severe hyponatremia) or > 155 mEq/L (severe hypernatremia)
  • WBC > 30 K/uL might indicate severe infection or leukemia

Would you like me to propose additional predicates for these critical ranges?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between ab993e8 and 23bebf1.

📒 Files selected for processing (8)
  • src/MEDS_DEV/datasets/MIMIC-IV/predicates.yaml (1 hunks)
  • src/MEDS_DEV/tasks/criteria/abnormal_lab/blood_chemistry/elevated_creatinine/first_24h.yaml (1 hunks)
  • src/MEDS_DEV/tasks/criteria/abnormal_lab/blood_chemistry/hyponatremia/first_24h.yaml (1 hunks)
  • src/MEDS_DEV/tasks/criteria/abnormal_lab/blood_chemistry/metabolic_acidosis/first_24h.yaml (1 hunks)
  • src/MEDS_DEV/tasks/criteria/abnormal_lab/cbc/anemia/first_24h.yaml (1 hunks)
  • src/MEDS_DEV/tasks/criteria/abnormal_lab/cbc/leukocytosis/first_24h.yaml (1 hunks)
  • src/MEDS_DEV/tasks/criteria/abnormal_lab/cbc/thrombocytopenia/first_24h.yaml (1 hunks)
  • src/MEDS_DEV/tasks/criteria/abnormal_lab/vital/hypotension/first_24h.yaml (1 hunks)
🔇 Additional comments (25)
src/MEDS_DEV/tasks/criteria/abnormal_lab/vital/hypotension/first_24h.yaml (3)

25-25: LGTM!

The trigger is appropriately set to hospital_admission, which aligns with the task's objective of predicting outcomes within 24 hours of hospital admission.


8-24: ⚠️ Potential issue

Define missing predicates.

Several critical predicates are undefined (marked with "???"):

  • hospital_admission
  • hospital_discharge
  • birth
  • death
  • map

These predicates need to be defined in the MIMIC predicates.yaml file to ensure proper task validation.

Let's verify if these predicates are defined in the MIMIC predicates file:


27-61: Verify window configurations.

The window configurations look well-structured with appropriate conditions. However, let's verify:

  1. The 48-hour minimum stay requirement is correctly enforced through the no_discharge window
  2. The target window properly aligns with the task description

Let's check for similar window configurations in other tasks:

✅ Verification successful

Window configurations are consistent and correctly structured

The verification confirms that:

  1. The 48-hour minimum stay requirement is consistently enforced across all similar tasks through the no_discharge window (trigger to trigger + 48h)
  2. The target window configuration properly aligns with other abnormal lab tasks, following the standard pattern:
    • Starts right after input window (input.end)
    • 24-hour duration (start + 24h)
    • Consistent inclusivity settings (start_inclusive: False, end_inclusive: True)
    • Appropriate label and measurement requirements

The window configurations in the hypotension task follow the established pattern used across all abnormal lab tasks in the codebase.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check window configurations in other similar tasks
# Look for similar window patterns to validate consistency
rg -A 10 "windows:" "src/MEDS_DEV/tasks/criteria/abnormal_lab/**/*.yaml"

Length of output: 163


Script:

#!/bin/bash
# Let's first find the correct path structure for abnormal lab tasks
fd "yaml$" --type f

Length of output: 1054


Script:

#!/bin/bash
# Now let's check window configurations in similar abnormal lab tasks
rg -A 15 "windows:" "src/MEDS_DEV/tasks/criteria/abnormal_lab/blood_chemistry/elevated_creatinine/first_24h.yaml" "src/MEDS_DEV/tasks/criteria/abnormal_lab/blood_chemistry/hyponatremia/first_24h.yaml" "src/MEDS_DEV/tasks/criteria/abnormal_lab/cbc/anemia/first_24h.yaml" "src/MEDS_DEV/tasks/criteria/abnormal_lab/vital/hypotension/first_24h.yaml"

Length of output: 7162


Script:

#!/bin/bash
# Let's check the target window configurations in these files
rg -A 10 "  target:" "src/MEDS_DEV/tasks/criteria/abnormal_lab/blood_chemistry/elevated_creatinine/first_24h.yaml" "src/MEDS_DEV/tasks/criteria/abnormal_lab/blood_chemistry/hyponatremia/first_24h.yaml" "src/MEDS_DEV/tasks/criteria/abnormal_lab/cbc/anemia/first_24h.yaml" "src/MEDS_DEV/tasks/criteria/abnormal_lab/vital/hypotension/first_24h.yaml"

Length of output: 3580

src/MEDS_DEV/tasks/criteria/abnormal_lab/cbc/leukocytosis/first_24h.yaml (4)

16-24: Verify WBC threshold with clinical guidelines.

The WBC threshold (>11 K/uL) for leukocytosis is defined, but we should verify this matches standard clinical guidelines.

Would you like me to help create a README.md file documenting the clinical rationale for this threshold, as mentioned in the PR objectives?


54-61: LGTM: Target window configuration is well-defined.

The target window configuration correctly implements the requirements:

  • 24-hour prediction window
  • Requires at least one WBC measurement
  • Uses abnormally_high_wbc for labeling

25-25: Ensure hospital_admission trigger is properly defined.

The task trigger depends on the undefined hospital_admission predicate. This needs to be addressed along with the other missing predicate definitions.

#!/bin/bash
# Check if the trigger predicate is used consistently
rg -t yaml "trigger:\s*hospital_admission" -A 5

8-15: ⚠️ Potential issue

Define missing core predicates.

The following predicates are undefined and marked with "???":

  • hospital_admission
  • hospital_discharge
  • birth
  • death

These are critical predicates that need to be defined for the task to function.

Let's check if these predicates are defined elsewhere:

src/MEDS_DEV/tasks/criteria/abnormal_lab/blood_chemistry/hyponatremia/first_24h.yaml (3)

27-61: Verify window sequence and constraints.

The window configuration appears logically sound but please verify:

  1. The 48-hour window in no_discharge correctly covers both input and target windows (24h + 24h)
  2. The sequence of window evaluations ensures proper patient filtering
  3. The requirement for at least one sodium measurement (sodium: (1, None)) is sufficient for reliable prediction
#!/bin/bash
# Search for similar window configurations in other lab prediction tasks
rg -t yaml "windows:.*input:.*target:" -A 20

16-23: Verify sodium threshold with medical standards.

The threshold for hyponatremia is set to < 135 mEq/L. Please verify this aligns with standard medical definitions and MIMIC-IV data distribution.

#!/bin/bash
# Search for other sodium-related definitions for consistency
rg -t yaml "sodium.*value_max:.*135" --no-filename

8-15: ⚠️ Potential issue

Define missing core predicates.

The following critical predicates are undefined:

  • hospital_admission
  • hospital_discharge
  • birth
  • death

These predicates are essential for the task's functionality and should be properly defined.

Let's check if these predicates are defined elsewhere:

src/MEDS_DEV/tasks/criteria/abnormal_lab/cbc/anemia/first_24h.yaml (4)

27-46: Verify discharge window configuration.

The no_discharge window configuration:

  1. Uses start + 48h but references its own start - should it reference trigger instead?
  2. Has hospital_admission count of 0 which might conflict with the trigger

Consider revising to:

   no_discharge:
     start: trigger
-    end: start + 48h
+    end: trigger + 48h
     start_inclusive: False
     end_inclusive: True
     has:
-      hospital_admission: (None, 0)
       discharge_or_death: (None, 0)

54-61: Validate target window configuration.

The target window configuration looks correct, but verify:

  1. The requirement for at least one hemoglobin measurement
  2. The use of abnormally_low_hemoglobin as the label
#!/bin/bash
# Check for similar window configurations in other lab prediction tasks
rg -A 5 "target:\s+start: input.end\s+end: start \+ 24h" "src/MEDS_DEV/tasks/criteria/"

16-24: Verify hemoglobin threshold and predicate definitions.

The abnormal hemoglobin threshold (13 g/dL) needs verification:

  1. The PR objectives mention this might need reevaluation due to high positive sample yield
  2. The hemoglobin and abnormally_low_hemoglobin predicates are undefined
#!/bin/bash
# Check if hemoglobin predicates are defined in MIMIC predicates
rg -l "hemoglobin:|abnormally_low_hemoglobin:" "src/MEDS_DEV/datasets/MIMIC-IV/predicates.yaml"

8-15: ⚠️ Potential issue

Define missing core predicates.

The following predicates are undefined:

  • hospital_admission
  • hospital_discharge
  • birth
  • death

These are critical for task functionality.

src/MEDS_DEV/tasks/criteria/abnormal_lab/cbc/thrombocytopenia/first_24h.yaml (3)

1-61: Overall task structure looks good.

The task definition aligns well with the PR objectives and follows the expected structure for abnormal lab prediction tasks. The thrombocytopenia threshold (150 K/uL) matches the specified criterion from the PR description.


9-12: ⚠️ Potential issue

Define missing predicates marked with "???".

The following predicates need to be defined:

  • hospital_admission
  • hospital_discharge
  • birth
  • death
  • platelets.code
  • abnormally_low_platelets.code

These predicates are crucial for the task's functionality. They should match the corresponding definitions in MIMIC-IV's predicates.yaml file.

Let's verify if these predicates are defined in the MIMIC predicates file:

#!/bin/bash
# Search for predicate definitions in MIMIC predicates file
rg -A 2 "hospital_admission:|hospital_discharge:|birth:|death:|platelets:" "src/MEDS_DEV/datasets/MIMIC-IV/predicates.yaml"

Also applies to: 18-18, 20-20


27-61: Consider edge cases in window definitions.

The current window configuration might need additional validation for edge cases:

  1. Input window: Consider validating if there are any platelet measurements in the first 24h
  2. Target window: Consider adding a maximum number of measurements requirement

Let's check if similar tasks have these validations:

src/MEDS_DEV/tasks/criteria/abnormal_lab/blood_chemistry/elevated_creatinine/first_24h.yaml (2)

16-24: Verify creatinine threshold value.

Please confirm that 1.3 mg/dL is the correct threshold for abnormally high creatinine. Additionally, the creatinine predicate's code is undefined (marked with ???).

Could you provide medical references or documentation supporting this threshold value?


27-61: Verify window boundary logic.

Please verify the following window configurations:

  1. Input window (0h to 24h): Ensures no previous elevated creatinine
  2. No discharge window (0h to 48h): Prevents including discharged patients
  3. Target window (24h to 48h): Requires at least one creatinine measurement

The windows appear to overlap correctly, but we should verify that this matches the intended clinical workflow.

Consider adding a diagram or documentation to visualize these time windows and their relationships.

src/MEDS_DEV/tasks/criteria/abnormal_lab/blood_chemistry/metabolic_acidosis/first_24h.yaml (4)

20-23: Verify the bicarbonate threshold value.

The threshold for abnormally low bicarbonate is set to < 22 mEq/L. Please verify this aligns with standard medical guidelines for metabolic acidosis diagnosis.


25-25: LGTM!

The trigger is appropriately set to hospital_admission, which aligns with the task's objective of prediction within the first 24 hours of admission.


54-61: LGTM!

The target window configuration correctly implements:

  • 24-hour prediction window
  • Required bicarbonate measurement
  • Appropriate label definition

9-12: ⚠️ Potential issue

Critical predicates are undefined.

Several essential predicates are marked with "???". These need to be implemented:

  • icu_admission
  • icu_discharge
  • hospital_discharge
  • death
  • bicarbonate
  • abnormally_low_bicarbonate

Let's verify if these predicates are defined in the MIMIC predicates file:

Also applies to: 18-19

✅ Verification successful

All referenced predicates are properly defined in MIMIC-IV predicates file

The verification shows that all predicates marked with "???" in the YAML file are actually defined in src/MEDS_DEV/datasets/MIMIC-IV/predicates.yaml:

  • icu_admission: defined with regex ^ICU_ADMISSION//.*
  • icu_discharge: defined with regex ^ICU_DISCHARGE//.*
  • hospital_discharge: defined with regex ^HOSPITAL_DISCHARGE//.*
  • death: defined with code MEDS_DEATH
  • bicarbonate: defined as an OR expression of bicarbonate_1 and bicarbonate_2
  • abnormally_low_bicarbonate: defined as an OR expression of abnormally_low_bicarbonate_1 and abnormally_low_bicarbonate_2

The "???" markers appear to be placeholders in a template file that references these already-defined predicates.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for predicate definitions in MIMIC predicates file
rg -A 5 "icu_admission:|icu_discharge:|hospital_discharge:|death:|bicarbonate:" "src/MEDS_DEV/datasets/MIMIC-IV/predicates.yaml"

Length of output: 745

src/MEDS_DEV/datasets/MIMIC-IV/predicates.yaml (2)

17-21: LGTM: Birth predicate follows existing pattern.

The addition of the birth predicate is clean and consistent with the file's structure.


143-167: LGTM: MAP predicates are well-structured.

The mean arterial pressure predicates are correctly implemented with:

  • Comprehensive coverage using three code variants
  • Correct threshold of 65 mmHg
  • Consistent naming and structure

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
src/MEDS_DEV/datasets/MIMIC-IV/predicates.yaml (1)

22-167: Consider adding section comments

The file would benefit from section comments to clearly delineate between different measurement groups (Blood Chemistry, CBC, Vital Signs).

Add section comments like this:

+  # Blood Chemistry Measurements
   creatinine_1:
     code: LAB//50912//mg/dL
   ...
+  # Complete Blood Count (CBC) Measurements
   hemoglobin_1:
     code: LAB//220228//g/dl
   ...
+  # Vital Signs Measurements
   map_1:
     code: LAB//220052//mmHg
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 23bebf1 and d327a69.

📒 Files selected for processing (1)
  • src/MEDS_DEV/datasets/MIMIC-IV/predicates.yaml (1 hunks)
🔇 Additional comments (7)
src/MEDS_DEV/datasets/MIMIC-IV/predicates.yaml (7)

17-21: LGTM: Birth predicate addition

The birth predicate follows the established pattern and is correctly positioned.


22-40: LGTM: Creatinine predicates

The implementation correctly defines elevated creatinine (>1.3 mg/dL) using both lab codes (50912, 52546) with consistent units.


41-66: LGTM: Sodium predicates

The implementation correctly defines hyponatremia (<135 mEq/L) using all three lab codes (220645, 50983, 52623) with consistent units.


67-85: LGTM: Bicarbonate predicates

The implementation correctly defines metabolic acidosis (<22 mEq/L) using both lab codes (227443, 50882) with consistent units.


105-142: LGTM: WBC and platelets predicates

The implementation correctly defines:

  • Leukocytosis (>11 K/uL) using both WBC codes (220546, 51300)
  • Thrombocytopenia (<150 K/uL) using both platelets codes (227457, 51265)

All predicates use consistent units.


143-167: LGTM: MAP predicates

The implementation correctly defines hypotension (<65 mmHg) using all three MAP codes (220052, 220181, 225312) with consistent units.


86-104: Verify hemoglobin threshold impact

The PR objectives note that the current threshold (<13 g/dL) may need reevaluation due to high positive sample yield in MIMIC-IV.

Let's analyze the distribution of hemoglobin values:

@Jwoo5
Copy link
Contributor Author

Jwoo5 commented Oct 30, 2024

After I renew the task definitions, the cohort numbers (positive / total) for each task on MIMIC-IV have become:

  • elevated_creatinine
    • 2088 / 114946
  • hyponatremia
    • 5269 / 101735
  • metabolic_acidosis
    • 5319 / 97086
  • anemia
    • 5085 / 7718 (seems that we need to revisit the criteria ranges for this task)
  • leukocytosis
    • 4260 / 16805
  • thrombocytopenia
    • 6353 / 114133
  • hypotension
    • 5821 / 15630

It seems that excluding patients who have shown abnormal values on the target lab previously has a critical effect on the number of positive samples for some tasks. Do these numbers look fine?

@Jwoo5 Jwoo5 requested a review from mmcdermott October 30, 2024 13:17
@mmcdermott mmcdermott merged commit ad919c4 into mmcdermott:main Nov 1, 2024
2 checks passed
Copy link
Owner

@mmcdermott mmcdermott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mandiehyewon
Copy link
Collaborator

@justin13601 Thanks! I think we can use standardized ranges focusing on adult patients for the first pass, though. What do you think, @mandiehyewon? If we should define different ranges based on gender or age for some tasks, could you provide the reference cutoffs so that I can reflect them to the task configs?

The age range and inclusion of all hospital admissions seem appropriate. I can also provide the normal ranges or abnormal cutoffs for other cohorts if needed.

@mandiehyewon
Copy link
Collaborator

  1. Also, should we have something different than cbc and blood_chemistry? blood_chemistry seems very broad relative to cbc, but maybe that is wrong?
    They were also defined based on @mandiehyewon's suggestions. Maybe this also needs an opinion from @Gallifantjack or @mandiehyewon.

@mandiehyewon, what do you think? Should we group them by typical lab panel name? By organ system? By just the name of the test? I'm not sure what would be best.

Since we aim to account for certain medical abnormalities that may or may not be present across various conditions (e.g., hyponatremia could occur in cases like SIADH or diarrhea), as @mmcdermott suggested, it’s best to use task names rather than the approach I previously recommended (CBC, Blood Chemistry).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants