-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minor fixes in MIMIC_example #224
Conversation
WalkthroughThe changes in this pull request focus on the Changes
Possibly related PRs
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
@@ -229,7 +229,7 @@ def main(cfg: DictConfig): | |||
) | |||
exit(0) | |||
|
|||
all_fps = list(input_dir.glob("**/*.*")) | |||
all_fps = list(input_dir.rglob("*/*.*")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allows to properly find files if subdirs are symbolic links
Codecov ReportAll modified and coverable lines are covered by tests ✅
✅ All tests successful. No failed tests found. Additional details and impacted files@@ Coverage Diff @@
## main #224 +/- ##
=======================================
Coverage 99.17% 99.17%
=======================================
Files 27 27
Lines 2055 2055
=======================================
Hits 2038 2038
Misses 17 17 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
MIMIC-IV_Example/pre_MEDS.py (2)
302-304
: Consider improving the file reading logicWhile the optimization for
.csv.gz
files is good, consider these improvements:
- Apply column selection optimization to other compatible file types
- Extract the common read logic to reduce duplication
- Add error handling for file read operations
Here's a suggested refactor:
- if df_to_load_fp.suffix in [".csv.gz"]: - df = df_to_load_read_fn(df_to_load_fp, columns=cols) - else: - df = df_to_load_read_fn(df_to_load_fp) + try: + read_kwargs = {"columns": cols} if df_to_load_fp.suffix in [".csv", ".csv.gz"] else {} + df = df_to_load_read_fn(df_to_load_fp, **read_kwargs) + except Exception as e: + logger.error(f"Failed to read {df_to_load_fp}: {str(e)}") + raise
Line range hint
232-304
: Consider adding data validation for medical recordsSince this script processes sensitive medical data (ICD codes, diagnoses, procedures), consider adding validation to ensure data integrity:
- Validate file checksums against expected values
- Add basic sanity checks for medical record formats
- Log any data anomalies for review
Would you like me to provide examples of data validation implementations for MIMIC-IV records?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
MIMIC-IV_Example/pre_MEDS.py
(2 hunks)
🔇 Additional comments (1)
MIMIC-IV_Example/pre_MEDS.py (1)
232-232
: Verify the file pattern change doesn't break MIMIC-IV file discovery
The change from glob("**/*.*")
to rglob("*/*.*")
is more restrictive as it requires files to be in at least one subdirectory. This could potentially miss files placed directly in the input directory.
Let's verify the MIMIC-IV file structure:
✅ Verification successful
Let me generate additional verification scripts to check the actual MIMIC-IV file structure pattern:
This will help us understand the expected MIMIC-IV file structure and verify if the new pattern is compatible.
Based on the search results, I can now generate the final response:
The file pattern change is safe and intentional
The change from glob("**/*.*")
to rglob("*/*.*")
is actually correct for MIMIC-IV data structure. From the documentation and configuration files, we can see that:
- MIMIC-IV raw files are always organized in subdirectories (e.g., 'hosp' and 'icu' subdirectories are required)
- The README explicitly states: "there should be a
hosp
andicu
subdirectory of$MIMICIV_RAW_DIR
" - The metadata files are also placed in subdirectories as shown by the file structure
The new pattern rglob("*/*.*")
correctly enforces this expected directory structure, making the code more robust by failing early if files are not organized as required by MIMIC-IV.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check if any MIMIC-IV files exist directly in the input directory
# that would be missed by the new pattern
# Find files directly in input_dir (depth=1)
fd --max-depth 1 --type f . --exec echo "Files that would be missed: {}"
# Compare with files in subdirectories (depth>1)
fd --min-depth 2 --type f . --exec echo "Files that would be found: {}"
Length of output: 9014
Script:
#!/bin/bash
# Search for any references to MIMIC-IV file paths or directory structure
rg -i "mimic.*(csv|file|dir|path)" -A 2
# Check the configuration files for MIMIC-IV
cat MIMIC-IV_Example/configs/extract_MIMIC.yaml
Length of output: 8040
Summary by CodeRabbit
New Features
Bug Fixes
Documentation