Add a basic example workflow. #16

kamilest · 2024-09-10T21:30:51Z

This is a very rough version of the workflow for installing MEDS-DEV, setting up the task and dataset configurations, running a model (with a helper script for generating random binary classification predictions), and evaluating the results.

Of course this will need to be improved in terms of usability, functionality, and robustness.

Summary by CodeRabbit

New Features
- Expanded the README.md to include comprehensive setup and usage instructions for the MEDS-DEV project.
- Introduced a new configuration file for predictions, enhancing flexibility in managing prediction directories.
- Added a shell script to automate the generation of random predictions within the MEDS_DEV framework.
- Implemented functionality for generating random predictions based on input datasets, streamlining the evaluation process.
Documentation
- Enhanced README.md with detailed workflows, setup instructions, and contribution guidelines.
- Clarified the repository's purpose and improved formatting for better readability.

coderabbitai · 2024-09-10T21:30:59Z

Caution

Review failed

The head commit changed during the review from 94cf216 to 96c9bac.

Walkthrough

The recent changes encompass updates to the .pre-commit-config.yaml, README.md, and several files within the src/MEDS_DEV directory. Key modifications include the update of the pre-commit-hooks version, enhancements to the README for user guidance, and the introduction of new scripts and configuration files aimed at generating random predictions and managing prediction settings. These updates collectively enhance the structure and usability of the MEDS-DEV project.

Changes

Files	Change Summary
`.pre-commit-config.yaml`	Updated `pre-commit-hooks` version from `v4.4.0` to `v5.0.0` and set default Python version to `python3.10`. Consolidated `args` key format.
`README.md`	Expanded content with detailed setup instructions, workflow steps, and configuration guidance.
`src/MEDS_DEV/configs/predictions.yaml`	Introduced configuration for prediction settings, including directory management and logging.
`src/MEDS_DEV/helpers/generate_predictions.sh`	Added a script for generating random predictions, setting up environment variables, and invoking Python functionality.
`src/MEDS_DEV/helpers/generate_random_predictions.py`	Introduced a module for generating random predictions from dataframes, utilizing Hydra for configuration.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant ShellScript
    participant PythonModule
    participant Config

    User->>ShellScript: Execute generate_predictions.sh
    ShellScript->>Config: Load environment variables
    ShellScript->>PythonModule: Call generate_random_predictions
    PythonModule->>Config: Read configuration settings
    PythonModule->>Dataframe: Load datasets
    Dataframe->>PythonModule: Provide data
    PythonModule->>Dataframe: Generate random predictions
    PythonModule->>Output: Save predictions

🐇 In a world of code, so bright and new,
I hop with joy, for changes are due!
From README to scripts, all polished and neat,
The MEDS-DEV project is now quite a treat!
With predictions to make and guidance so clear,
Let’s celebrate progress, give a cheer! 🎉

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 2

Outside diff range, codebase verification and nitpick comments (2)

src/MEDS_DEV/helpers/generate_random_predictions.py (1)

1-68: Suggestion: Enhance error handling and logging.

Consider adding the following enhancements to improve the robustness and debuggability of the script:

Add error handling using try/except blocks to gracefully handle potential exceptions, such as file I/O errors or invalid configurations.

Use the logging module to add informative log messages at different levels (e.g., debug, info, warning, error) to help with debugging and monitoring.

README.md (1)

24-24: Suggestion: Minor formatting and punctuation adjustments.

Consider making the following minor adjustments based on the static analysis hints:

Line 24: Add a comma after "Additionally" to improve readability.

Line 72: Remove the comma before "because" since the clause is essential to the meaning.

Also applies to: 72-72

Tools

LanguageTool

[uncategorized] ~24-~24: Possible missing comma found.
Context: ... conda activate $MY_MEDS_CONDA_ENV ``` Additionally install any model-related dependencies....

(AI_HYDRA_LEO_MISSING_COMMA)

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between e7119d1 and 5756038.

Files selected for processing (5)

.pre-commit-config.yaml (1 hunks)
README.md (1 hunks)
src/MEDS_DEV/configs/predictions.yaml (1 hunks)
src/MEDS_DEV/helpers/generate_predictions.sh (1 hunks)
src/MEDS_DEV/helpers/generate_random_predictions.py (1 hunks)

Files skipped from review due to trivial changes (2)

.pre-commit-config.yaml
src/MEDS_DEV/configs/predictions.yaml

Additional context used

LanguageTool

README.md

[uncategorized] ~24-~24: Possible missing comma found.
Context: ... conda activate $MY_MEDS_CONDA_ENV ``` Additionally install any model-related dependencies....

(AI_HYDRA_LEO_MISSING_COMMA)

[formatting] ~72-~72: If the ‘because’ clause is essential to the meaning, do not use a comma before the clause.
Context: ... Task configuration files are incomplete, because some concepts (predicates) have to be d...

(COMMA_BEFORE_BECAUSE)

Markdownlint

README.md

64-64: null
Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

83-83: null
Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

Additional comments not posted (3)

src/MEDS_DEV/helpers/generate_predictions.sh (1)

1-16: LGTM!

The bash script follows best practices and is well-structured. The use of environment variables for configuration is a good practice.

src/MEDS_DEV/helpers/generate_random_predictions.py (1)

1-68: LGTM overall!

The Python script is well-structured and modular. The use of hydra for configuration management is a good practice. The script handles different split types and file structures correctly.

README.md (1)

Line range hint 1-129: Significant improvements in usability and clarity!

The enhancements to the README.md file greatly improve the usability and clarity of the project. The detailed instructions, examples, and well-organized content will help users get started quickly and follow the workflow smoothly. Great job!

Tools

LanguageTool

[uncategorized] ~24-~24: Possible missing comma found.
Context: ... conda activate $MY_MEDS_CONDA_ENV ``` Additionally install any model-related dependencies....

(AI_HYDRA_LEO_MISSING_COMMA)

[formatting] ~72-~72: If the ‘because’ clause is essential to the meaning, do not use a comma before the clause.
Context: ... Task configuration files are incomplete, because some concepts (predicates) have to be d...

(COMMA_BEFORE_BECAUSE)

Markdownlint

64-64: null
Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

83-83: null
Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

src/MEDS_DEV/helpers/generate_predictions.sh

src/MEDS_DEV/helpers/generate_random_predictions.py

mmcdermott · 2024-10-04T16:30:16Z

@kamilest can you fix the code quality issues that are causing that workflow to fail? See here for details: https://github.com/mmcdermott/MEDS-DEV/actions/runs/10800915362/job/29959946112?pr=16
I'm happy to have a quick call about how to get it set up so that your terminal automatically checks for these issues on any commit, if that would be of interest.

mmcdermott · 2024-10-04T16:32:20Z

src/MEDS_DEV/helpers/generate_predictions.sh

+MEDS_DEV_REPO_DIR=$(python -c "from importlib.resources import files; print(files(\"MEDS_DEV\"))")
+export MEDS_DEV_REPO_DIR
+
+# TODO improve efficiency of prediction generator by using this


What is this TODO for? I'm not sure I understand.

At the moment I'm reimplementing recursive search through the directory to find shards for which to generate the predictions, but I think this could be improved with the expand_shards helper you have implemented.

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (7)

README.md (7)

12-26: Great addition: Example workflow section

The new "Example workflow" section is a valuable addition to the README. It provides clear instructions for setting up the MEDS project environment, including helpful command-line examples in code blocks.

Consider adding a brief explanation of what $MY_MEDS_PROJECT_ROOT and $MY_MEDS_CONDA_ENV represent to ensure users understand how to customize these variables for their setup.

27-43: Excellent: Clear installation instructions

The installation instructions for MEDS-DEV and the MEDS evaluation package are clear, concise, and well-formatted. The use of code blocks for command-line examples enhances usability.

Consider adding a note about potential system requirements or dependencies that users should have installed before proceeding with these steps.

47-69: Good addition: Task extraction and configuration information

This section provides valuable information about extracting tasks from the MEDS dataset, including details on configuration files and naming conventions. The guidelines for adding a new task are a helpful inclusion.

Consider the following improvements:

Expand the section on adding a new task with more specific steps or a brief example.

Add a note about any potential limitations or considerations when creating custom tasks.

🧰 Tools

🪛 Markdownlint

64-64: null
Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

70-88: Good addition: Dataset configuration information

This section provides useful information about dataset configuration files and how to add new ones. The explanation of dataset-specific predicate definitions is clear and helpful.

Consider the following improvements:

Expand the instructions for adding a new dataset configuration file with more specific steps or a brief example.

In line 72, remove the comma before "because" as it's not necessary in this context: "Task configuration files are incomplete because some concepts (predicates) have to be defined in a dataset-specific way".

Add a brief explanation of what predicates are in this context, as it might not be clear to all users.

🧰 Tools

🪛 LanguageTool

[formatting] ~72-~72: If the ‘because’ clause is essential to the meaning, do not use a comma before the clause.
Context: ... Task configuration files are incomplete, because some concepts (predicates) have to be d...

(COMMA_BEFORE_BECAUSE)

🪛 Markdownlint

83-83: null
Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

89-101: Excellent: Clear instructions for task extraction

This section provides clear and detailed instructions for running the MEDS task extraction helper. The command-line example and explanation of the extraction process and output location are particularly helpful.

Consider adding a note about any prerequisites or dependencies required for running the task extraction helper, such as having the MEDS dataset already downloaded and properly structured in the $MEDS_ROOT_DIR.

114-128: Excellent: Clear model evaluation instructions

This section provides clear and detailed instructions for evaluating the model using the meds-evaluation package. The command-line example and explanation of the output are particularly helpful.

Consider the following improvements:

Add a brief explanation of what the evaluation metrics are and how to interpret the results in the JSON file.

In line 127, add a comma after "argument" for better readability: "This will create a JSON file with the results in the directory provided by the output_dir argument, Note this package currently supports binary classification only."

Consider mentioning any plans for supporting other types of classification or regression tasks in future versions of the package.

🧰 Tools

🪛 LanguageTool

[uncategorized] ~127-~127: Possible missing comma found.
Context: ...provided by the output_dir argument. Note this package currently supports binary ...

(AI_HYDRA_LEO_MISSING_COMMA)

Line range hint 1-128: Overall: Excellent improvements to documentation

The changes to the README.md file significantly enhance the documentation for the MEDS-DEV project. The new sections provide a clear, structured workflow that covers installation, task extraction, model training, and evaluation. These additions will greatly improve the user experience and make it easier for new contributors to get started with the project.

To further improve the documentation, consider adding a table of contents at the beginning of the README. This would help users quickly navigate to specific sections of interest, especially as the document grows with more detailed instructions and examples.

🧰 Tools

🪛 LanguageTool

[formatting] ~72-~72: If the ‘because’ clause is essential to the meaning, do not use a comma before the clause.
Context: ... Task configuration files are incomplete, because some concepts (predicates) have to be d...

(COMMA_BEFORE_BECAUSE)

[uncategorized] ~127-~127: Possible missing comma found.
Context: ...provided by the output_dir argument. Note this package currently supports binary ...

(AI_HYDRA_LEO_MISSING_COMMA)

🪛 Markdownlint

64-64: null
Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

83-83: null
Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 5756038 and c2df073.

📒 Files selected for processing (1)

README.md (1 hunks)

🧰 Additional context used

🪛 LanguageTool

README.md

[formatting] ~72-~72: If the ‘because’ clause is essential to the meaning, do not use a comma before the clause.
Context: ... Task configuration files are incomplete, because some concepts (predicates) have to be d...

(COMMA_BEFORE_BECAUSE)

[uncategorized] ~127-~127: Possible missing comma found.
Context: ...provided by the output_dir argument. Note this package currently supports binary ...

(AI_HYDRA_LEO_MISSING_COMMA)

🪛 Markdownlint

README.md

64-64: null
Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

83-83: null
Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

🔇 Additional comments (2)

README.md (2)

7-8: LGTM: Improved readability

The text split improves readability without changing the content. This is a good formatting adjustment.

102-112: Good addition: Model training section

This section provides a helpful starting point for model training, acknowledging that the process may vary depending on the specific model's API. The example command for generating random outputs is useful for testing purposes.

Could you please clarify the purpose of the random output generation step? Is this intended as a placeholder example, or is it a legitimate part of the workflow for testing or baseline comparison? Consider adding a brief explanation of when and why users might want to use this random output generation.

coderabbitai

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (1)

src/MEDS_DEV/helpers/generate_random_predictions.py (1)

30-30: Implement the expand_shards helper to access sharded data directly.

The TODO comment at line 30 indicates the need to use the expand_shards helper for accessing sharded dataframes directly. Implementing this will enhance code efficiency and maintainability.

Would you like assistance in implementing the expand_shards helper or opening a GitHub issue to track this task?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between c2df073 and 26b439f.

📒 Files selected for processing (1)

src/MEDS_DEV/helpers/generate_random_predictions.py (1 hunks)

🧰 Additional context used

coderabbitai · 2024-10-10T13:50:30Z

src/MEDS_DEV/helpers/generate_random_predictions.py

+                    dataframe = pl.read_parquet(file)
+                    predictions = _generate_random_predictions(dataframe)  # sharded dataframes


⚠️ Potential issue

Add error handling for file I/O operations.

Reading parquet files without error handling may cause the script to crash if a file is missing or corrupted. Incorporate try-except blocks around file I/O operations to handle exceptions gracefully and provide informative error messages.

Apply this diff to add error handling:

for split in cohort_dir.iterdir(): if split.is_dir() and split.name in {"train", "tuning", "held_out"}: for file in split.iterdir(): if file.is_file(): - dataframe = pl.read_parquet(file) - predictions = _generate_random_predictions(dataframe) + try: + dataframe = pl.read_parquet(file) + predictions = _generate_random_predictions(dataframe) + except Exception as e: + print(f"Error processing {file}: {e}") + continue # Rest of the code... elif split.is_file(): - dataframe = pl.read_parquet(split) - predictions = _generate_random_predictions(dataframe) + try: + dataframe = pl.read_parquet(split) + predictions = _generate_random_predictions(dataframe) + except Exception as e: + print(f"Error processing {split}: {e}") + return

Also applies to: 44-45

coderabbitai · 2024-10-10T13:50:30Z

src/MEDS_DEV/helpers/generate_random_predictions.py

+    probabilities = rng.uniform(0, 1, len(dataframe))
+    # TODO: meds-evaluation currently cares about the order of columns and types, so the new columns have to
+    #  be inserted at the correct position and cast to the correct type
+    output.insert_column(3, pl.Series(PREDICTED_BOOLEAN_VALUE_COLUMN, probabilities.round()).cast(pl.Boolean))


🛠️ Refactor suggestion

Generate boolean predictions using a threshold comparison.

Rounding uniform probabilities may not produce an unbiased random boolean outcome. Using a threshold comparison ensures a fair distribution of True and False values.

Apply this diff to update the boolean value generation:

- output.insert_column(3, pl.Series(PREDICTED_BOOLEAN_VALUE_COLUMN, probabilities.round()).cast(pl.Boolean)) + predicted_values = probabilities > 0.5 + output.insert_column(3, pl.Series(PREDICTED_BOOLEAN_VALUE_COLUMN, predicted_values))

Alternatively, use rng.choice for direct boolean generation:

- probabilities = rng.uniform(0, 1, len(dataframe)) + predicted_values = rng.choice([True, False], size=len(dataframe))

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

output.insert_column(3, pl.Series(PREDICTED_BOOLEAN_VALUE_COLUMN, probabilities.round()).cast(pl.Boolean))

predicted_values = probabilities > 0.5

output.insert_column(3, pl.Series(PREDICTED_BOOLEAN_VALUE_COLUMN, predicted_values))

coderabbitai · 2024-10-10T13:50:30Z

src/MEDS_DEV/helpers/generate_random_predictions.py

+    for split in cohort_dir.iterdir():
+        if split.is_dir() and split.name in {"train", "tuning", "held_out"}:  # train | tuning | held_out
+            for file in split.iterdir():
+                if file.is_file():
+                    dataframe = pl.read_parquet(file)
+                    predictions = _generate_random_predictions(dataframe)  # sharded dataframes
+
+                    # $MEDS_ROOT_DIR/task_predictions/$TASK_NAME/<split>/<file>.parquet
+                    predictions_path = Path(cohort_predictions_dir) / cohort_name / split.name
+                    os.makedirs(predictions_path, exist_ok=True)
+
+                    predictions.write_parquet(predictions_path / file.name)
+        elif split.is_file():
+            dataframe = pl.read_parquet(split)
+            predictions = _generate_random_predictions(dataframe)
+
+            predictions_path = Path(cohort_predictions_dir) / cohort_name
+            os.makedirs(predictions_path, exist_ok=True)
+
+            predictions.write_parquet(predictions_path / split.name)
+


🛠️ Refactor suggestion

Refactor duplicated code to improve maintainability.

The code segments handling directory and file inputs contain duplicated logic, particularly in reading dataframes and writing predictions. Refactoring this section by extracting common functionality into helper functions will reduce code duplication and enhance readability.

Apply this refactor to consolidate duplicated code:

def generate_random_predictions(cfg: DictConfig) -> None: # Existing code above... + def process_file(file, predictions_path): + dataframe = pl.read_parquet(file) + predictions = _generate_random_predictions(dataframe) + os.makedirs(predictions_path, exist_ok=True) + predictions.write_parquet(predictions_path / file.name) + for split in cohort_dir.iterdir(): if split.is_dir() and split.name in {"train", "tuning", "held_out"}: for file in split.iterdir(): if file.is_file(): - dataframe = pl.read_parquet(file) - predictions = _generate_random_predictions(dataframe) - - predictions_path = Path(cohort_predictions_dir) / cohort_name / split.name - os.makedirs(predictions_path, exist_ok=True) - - predictions.write_parquet(predictions_path / file.name) + predictions_path = Path(cohort_predictions_dir) / cohort_name / split.name + process_file(file, predictions_path) elif split.is_file(): - dataframe = pl.read_parquet(split) - predictions = _generate_random_predictions(dataframe) - predictions_path = Path(cohort_predictions_dir) / cohort_name - os.makedirs(predictions_path, exist_ok=True) - - predictions.write_parquet(predictions_path / split.name) + process_file(split, predictions_path)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

for split in cohort_dir.iterdir():

if split.is_dir() and split.name in {"train", "tuning", "held_out"}: # train | tuning | held_out

for file in split.iterdir():

if file.is_file():

dataframe = pl.read_parquet(file)

predictions = _generate_random_predictions(dataframe) # sharded dataframes

# $MEDS_ROOT_DIR/task_predictions/$TASK_NAME/<split>/<file>.parquet

predictions_path = Path(cohort_predictions_dir) / cohort_name / split.name

os.makedirs(predictions_path, exist_ok=True)

predictions.write_parquet(predictions_path / file.name)

elif split.is_file():

dataframe = pl.read_parquet(split)

predictions = _generate_random_predictions(dataframe)

predictions_path = Path(cohort_predictions_dir) / cohort_name

os.makedirs(predictions_path, exist_ok=True)

predictions.write_parquet(predictions_path / split.name)

def process_file(file, predictions_path):

dataframe = pl.read_parquet(file)

predictions = _generate_random_predictions(dataframe)

os.makedirs(predictions_path, exist_ok=True)

predictions.write_parquet(predictions_path / file.name)

for split in cohort_dir.iterdir():

if split.is_dir() and split.name in {"train", "tuning", "held_out"}: # train | tuning | held_out

for file in split.iterdir():

if file.is_file():

predictions_path = Path(cohort_predictions_dir) / cohort_name / split.name

process_file(file, predictions_path)

elif split.is_file():

predictions_path = Path(cohort_predictions_dir) / cohort_name

process_file(split, predictions_path)

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)

.pre-commit-config.yaml (1)
Python 3.10 is not available in the environment.

The pre-commit configuration specifies Python 3.10, but it is not installed. This may cause pre-commit hooks to fail.

Ensure Python 3.10 is installed in all development and CI environments.

Alternatively, update .pre-commit-config.yaml to use an available Python version.

🔗 Analysis chain

Line range hint 1-2: Approve Python version specification and suggest verification.

Explicitly setting the default Python version to 3.10 is a good practice for ensuring consistency across different environments.

To ensure project-wide compatibility and clarity:

Verify that all project dependencies are compatible with Python 3.10.

Update the project's README or documentation to clearly state the Python version requirement.

Ensure that all development and production environments can support Python 3.10.

You can use the following command to check your current Python version and compare it with the specified version:

Additionally, review your requirements.txt or setup.py file to ensure all dependencies support Python 3.10:

If you're using a virtual environment, make sure it's created with Python 3.10:
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
python --version
Length of output: 92

Script:
cat requirements.txt
# or
cat setup.py
Length of output: 127

Script:
python3.10 -m venv venv
Length of output: 103

Script:
#!/bin/bash
# Check for alternative Python executables
which python3
which python3.10

# Search for alternative dependency files
fd --extension toml --type f pyproject.toml
fd --extension toml --type f Pipfile

# Check for Python scripts in the repository
fd --extension py --type f
Length of output: 474

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 26b439f and 07ab03c.

📒 Files selected for processing (1)

.pre-commit-config.yaml (1 hunks)

🧰 Additional context used

🔇 Additional comments (2)

.pre-commit-config.yaml (2)

Line range hint 1-124: Overall configuration looks comprehensive and well-maintained.

The .pre-commit-config.yaml file contains a robust set of hooks for ensuring code quality, including formatting, linting, and various checks. The recent updates to the pre-commit-hooks version and the explicit Python version specification contribute to keeping the project up-to-date and consistent.

As a final note, consider periodically reviewing and updating the versions of other hooks in this file to ensure you're benefiting from the latest improvements and security updates across all tools.

8-8: Approve version update and suggest verification.

The update of pre-commit-hooks from v4.4.0 to v5.0.0 is a good practice to keep the project up-to-date with the latest features and improvements.

To ensure compatibility and take advantage of any new features:

Please review the changelog for pre-commit-hooks v5.0.0 to understand any breaking changes or new features.

Verify that all existing hooks still function as expected with this new version.

You can use the following command to check the hooks:

This will run all the hooks against all files in the repository, allowing you to catch any potential issues introduced by the version update.

kamilest added 10 commits August 30, 2024 16:52

Example workflow draft

b28726e

Merge branch 'refs/heads/main' into e2e-tutorial

0e15c17

Update the instructions

19ae1d9

Helper for generating random binary classification labels.

8e9acab

Add prediction time

3afe7bf

Fix prediction positions and types

3811f40

Add evaluation example

9e0c0e4

Pre-commit fixes

791d18f

Clarify output format.

5ffef7d

Change arguments to more generic ones.

5756038

kamilest requested review from mmcdermott and Simonlee711 September 10, 2024 21:30

coderabbitai bot reviewed Sep 10, 2024

View reviewed changes

src/MEDS_DEV/helpers/generate_predictions.sh Show resolved Hide resolved

src/MEDS_DEV/helpers/generate_random_predictions.py Show resolved Hide resolved

mmcdermott approved these changes Oct 4, 2024

View reviewed changes

Spacing

c2df073

coderabbitai bot reviewed Oct 8, 2024

View reviewed changes

mmcdermott added 3 commits October 10, 2024 09:42

Updated pre-commit-config

b2affac

Added simple doctest

26b439f

Updated pre-commit-config

07ab03c

coderabbitai bot reviewed Oct 10, 2024

View reviewed changes

mmcdermott added 7 commits October 10, 2024 09:59

Freeze pre-commit version until docformatter pushes a new release.

e974c1f

Update workflow files to install correct pre-commit version.

f66880d

Updateing workflows

beba3ee

Updating workflows

ae6f1d4

Updating README for pre-commit nonsense.

ae744c6

Updating README for pre-commit nonsense.

94cf216

Updating README for pre-commit nonsense.

96c9bac

kamilest merged commit 30d5631 into main Oct 10, 2024
5 checks passed

This was referenced Oct 11, 2024

Template for adding a new model #21

Merged

Add a description for the prediction schema of MEDS evaluation. #22

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a basic example workflow. #16

Add a basic example workflow. #16

kamilest commented Sep 10, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 10, 2024 •

edited

Loading

Review failed

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

mmcdermott commented Oct 4, 2024

mmcdermott Oct 4, 2024

kamilest Oct 8, 2024

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot Oct 10, 2024

coderabbitai bot Oct 10, 2024

coderabbitai bot Oct 10, 2024

coderabbitai bot left a comment

		dataframe = pl.read_parquet(file)
		predictions = _generate_random_predictions(dataframe) # sharded dataframes

	output.insert_column(3, pl.Series(PREDICTED_BOOLEAN_VALUE_COLUMN, probabilities.round()).cast(pl.Boolean))
	predicted_values = probabilities > 0.5
	output.insert_column(3, pl.Series(PREDICTED_BOOLEAN_VALUE_COLUMN, predicted_values))

Add a basic example workflow. #16

Add a basic example workflow. #16

Conversation

kamilest commented Sep 10, 2024 • edited by coderabbitai bot Loading

Summary by CodeRabbit

coderabbitai bot commented Sep 10, 2024 • edited Loading

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

mmcdermott commented Oct 4, 2024

mmcdermott Oct 4, 2024

Choose a reason for hiding this comment

kamilest Oct 8, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Oct 10, 2024

Choose a reason for hiding this comment

coderabbitai bot Oct 10, 2024

Choose a reason for hiding this comment

coderabbitai bot Oct 10, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

kamilest commented Sep 10, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 10, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)