Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update dataset.md #31

Merged
merged 9 commits into from
Oct 31, 2024
60 changes: 48 additions & 12 deletions src/MEDS_DEV/templates/dataset.md
Original file line number Diff line number Diff line change
@@ -1,30 +1,66 @@
# New Dataset Template
# \[New dataset\] \*\*Add your dataset name here\*\*

kamilest marked this conversation as resolved.
Show resolved Hide resolved
This is a template for creating a new dataset in MEDS-DEV. The dataset should be stored in a directory named after the dataset in the `src/MEDS-DEV/datasets` directory.
This is a template for supporting a new dataset in MEDS-DEV. Please edit the information on the template as appropriate, and commit this information in a `README.md` file in a directory named after your dataset in `src/MEDS_DEV/datasets` (e.g. `src/MEDS_DEV/datasets/MIMIC-IV/README.md`).

## Description

Describe the dataset in a few sentences. A link to the dataset's homepage and/or repository or a research paper is recommended.
Provide a brief description of your dataset, including:

## Access Requirements
- Aggregate statistics (e.g., number of patients, time range, age distributions, demographic distributions)
- Cohort description:
- Inclusion/exclusion criteria
- Censoring rules
- Key demographic characteristics
- Date Range of the dataset (e.g. Patients at Hospital A from 2013-2023)
kamilest marked this conversation as resolved.
Show resolved Hide resolved
- Coding Standards if known (e.g. OMOP, etc.)

kamilest marked this conversation as resolved.
Show resolved Hide resolved
## Supported tasks

Provide the list of the existing tasks already present in MEDS-DEV that are covered, and note any exceptions for tasks that **cannot** be run on your dataset (e.g. due to censoring limitations, incompatible inclusion criteria, etc.)

For supported tasks, provide the predicate definitions in a `predicates.yaml` file in `src/MEDS_DEV/datasets/$YOUR_DATASET_NAME` (e.g. `src/MEDS_DEV/datasets/MIMIC-IV/predicates.yaml`).

### Future tasks

If there are any currently undefined tasks that your dataset would be particularly suitable for, describe them here.

kamilest marked this conversation as resolved.
Show resolved Hide resolved
## Resources and links

Please provide the following:

- Link to the dataset's webpage and/or documentation (e.g., institutional repository, GitHub)
- Relevant research papers or articles, e.g.:
- Original dataset publication
- Key studies using this dataset
- Methodology papers
- Additional resources, e.g.:
- Data dictionaries
- Code repositories
- Usage examples

kamilest marked this conversation as resolved.
Show resolved Hide resolved
## Access requirements

Describe any access requirements for the dataset (e.g, human species research). If the dataset is publicly available, state that here. If the dataset is not publicly available, describe the process for obtaining access. We recommend the following topics be covered:

- **Access Policy**: Describe the access policy for the dataset, including any restrictions or permissions required.
- **License (for files)**: Specify the license under which the dataset files are distributed.
- **Data Use Agreement**: Specify any data use agreement that must be signed to access the dataset.
- **Required training**: Specify any training or certification required to access the dataset.
- **Point of Contact**: IF data is proprietary, include a point of contact to send model code and weights to for running the evaluation.

kamilest marked this conversation as resolved.
Show resolved Hide resolved
## Supported Tasks

Describe the existing tasks already present in MEDS-DEV that are covered. If there are new tasks that can be added, describe them here. Also note the `predicates.yaml` file that specifies the dataset's predicates.
## MEDS compatibility

## MEDS-transformation
Shortly specify the process of transforming this dataset into the MEDS format. If the dataset is already in the MEDS format when downloaded, specify that here.

Shortly specify the process of transforming this dataset to the MEDS format. If the dataset is already in the MEDS format when downloaded, specify that here.
Provide any other instructions for how to prepare the dataset for use with MEDS models and tasks.

kamilest marked this conversation as resolved.
Show resolved Hide resolved
## Sources
## Checklist

Summarize the sources of the dataset. If the dataset is a combination of multiple sources, list them here.
Please ensure your model conforms to the MEDS-DEV API by checking the following:

1. https://link-to-dataset.org
- [ ] I filled out the above template and committed it as a `README.md` file in a directory named after the dataset in `src/MEDS_DEV/datasets`.
- [ ] I included the predicates yaml file, defining all predictates required for the supported tasks.
- [ ] I verified all resource links are accessible
- [ ] I included example usage code (if applicable)
- [ ] I documented any known limitations or biases in the dataset
- [ ] I specified the dataset version or date of last update
Loading