-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update dataset.md #31
Conversation
Thoughts on the dataset template? What should the folder path be for the predicates file the user is expected to add?
WalkthroughThe changes introduce a new markdown template for datasets within the Changes
Possibly related PRs
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
I like this, but chat with @prockenschaub and @rvandewater -- they've added similar dataset templating information in #27, @Oufattole |
@Oufattole do you prefer to take over our changes as you see fit? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (2)
src/MEDS_DEV/templates/dataset.md (2)
1-1
: Simplify the title format.The escaped characters make the template harder to read and copy.
-# \[New dataset\] \*\*Add your dataset name here\*\* +# [New dataset] **Add your dataset name here**
43-48
: Consider adding version history and contribution sections.To better track changes and guide contributors, consider adding these sections:
## Version History Document any significant updates or changes to the dataset. ## Contributing Guidelines Specify how others can contribute to improving the dataset or documentation.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
src/MEDS_DEV/templates/dataset.md
(2 hunks)
🧰 Additional context used
🪛 Markdownlint
src/MEDS_DEV/templates/dataset.md
37-37: null
Bare URL used
(MD034, no-bare-urls)
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
…-DEV into dataset-template
Added some other dataset descriptions that I think might be useful. Included coding standards (e.g OMOP) and also included the date range for the entire dataset (Hospital A contains data from 2013-2019)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (2)
src/MEDS_DEV/templates/dataset.md (2)
1-1
: Remove unnecessary markdown escapes in title.The square brackets don't need to be escaped in markdown titles.
-# \[New dataset\] \*\*Add your dataset name here\*\* +# [New dataset] **Add your dataset name here**
21-21
: Maintain consistency in path variables.The path example uses different formats:
$YOUR_DATASET_NAME
vsMIMIC-IV
. Consider using consistent notation.-For supported tasks, provide the predicate definitions in a `predicates.yaml` file in `src/MEDS_DEV/datasets/$YOUR_DATASET_NAME` (e.g. `src/MEDS_DEV/datasets/MIMIC-IV/predicates.yaml`). +For supported tasks, provide the predicate definitions in a `predicates.yaml` file in `src/MEDS_DEV/datasets/${dataset_name}` (e.g. `src/MEDS_DEV/datasets/MIMIC-IV/predicates.yaml`).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
src/MEDS_DEV/templates/dataset.md
(2 hunks)
🧰 Additional context used
🪛 Markdownlint
src/MEDS_DEV/templates/dataset.md
11-11: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
12-12: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
13-13: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
14-14: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
15-15: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
33-33: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
34-34: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
35-35: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
37-37: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
38-38: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
39-39: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
🔇 Additional comments (2)
src/MEDS_DEV/templates/dataset.md (2)
Line range hint 41-49
: LGTM!
The access requirements section is well-structured and covers all necessary aspects of dataset access.
🧰 Tools
🪛 Markdownlint
11-11: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
12-12: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
13-13: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
14-14: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
15-15: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
33-33: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
34-34: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
35-35: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
37-37: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
38-38: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
39-39: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
56-65
: LGTM!
The checklist is comprehensive and covers all necessary aspects of dataset submission.
Added a point of contact for proprietary data, enabling users to proceed with submitting a job for evaluation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (2)
src/MEDS_DEV/templates/dataset.md (2)
9-15
: Standardize example formatting.The examples in the description section use inconsistent formatting. Some use "e.g." within parentheses while others use a comma followed by "e.g."
-Aggregate statistics (e.g., number of patients, time range, age distributions, demographic distributions) +Aggregate statistics, e.g.: + - Number of patients + - Time range + - Age distributions + - Demographic distributions🧰 Tools
🪛 Markdownlint
11-11: Expected: 2; Actual: 4
Unordered list indentation(MD007, ul-indent)
12-12: Expected: 2; Actual: 4
Unordered list indentation(MD007, ul-indent)
13-13: Expected: 2; Actual: 4
Unordered list indentation(MD007, ul-indent)
14-14: Expected: 2; Actual: 4
Unordered list indentation(MD007, ul-indent)
15-15: Expected: 2; Actual: 4
Unordered list indentation(MD007, ul-indent)
21-21
: Clarify predicate file path instruction.The instruction for the predicate file path uses a variable notation that might be unclear to users.
-For supported tasks, provide the predicate definitions in a `predicates.yaml` file in `src/MEDS_DEV/datasets/$YOUR_DATASET_NAME` (e.g. `src/MEDS_DEV/datasets/MIMIC-IV/predicates.yaml`). +For supported tasks, provide the predicate definitions in a `predicates.yaml` file in the dataset directory. For example, if your dataset is named "MIMIC-IV", the file should be at `src/MEDS_DEV/datasets/MIMIC-IV/predicates.yaml`.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
src/MEDS_DEV/templates/dataset.md
(1 hunks)
🧰 Additional context used
🪛 Markdownlint
src/MEDS_DEV/templates/dataset.md
11-11: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
12-12: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
13-13: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
14-14: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
15-15: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
33-33: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
34-34: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
35-35: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
37-37: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
38-38: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
39-39: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
@prockenschaub @rvandewater I tried to merge the two versions for the dataset template (sorry this turned out to be concurrent!) and included suggestions from @Simonlee711 and coderabbit. Let me know if you have any thoughts! @Jeanselme maybe we should also think about and request more information on any fairness-related considerations for the dataset (e.g. which groups are possible, request their definitions, etc.) Also cc: @Oufattole @mmcdermott |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (1)
src/MEDS_DEV/templates/dataset.md (1)
21-21
: Consider adding predicates.yaml format example.While the location for predicates.yaml is clear, providing an example of its expected format would help users create valid definitions.
Would you like me to generate an example predicates.yaml format?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
src/MEDS_DEV/templates/dataset.md
(1 hunks)
🧰 Additional context used
🪛 Markdownlint
src/MEDS_DEV/templates/dataset.md
11-11: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
12-12: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
13-13: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
33-33: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
34-34: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
35-35: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
37-37: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
38-38: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
39-39: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
🔇 Additional comments (3)
src/MEDS_DEV/templates/dataset.md (3)
3-3
: LGTM! Clear instructions for template usage.
The introduction clearly specifies where to place the README.md file and how to name the directory.
31-39
: LGTM! Comprehensive resource requirements.
The section provides a clear and thorough list of required documentation and resources.
🧰 Tools
🪛 Markdownlint
33-33: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
34-34: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
35-35: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
37-37: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
38-38: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
39-39: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
47-51
: LGTM! Well-structured access requirements.
The section thoroughly covers all aspects of dataset access, from policies to points of contact.
Looks good @kamilest |
Thoughts on the dataset template?
We should add that the user is expected to commit in the pull request:
datasets/${dataset_name}/predicates.yaml
datasets/${dataset_name}/README.md
Summary by CodeRabbit