A python tool to explore, enhance, and expand SPARC datasets and their descriptions in accordance with FAIR principles.
- About
- Introduction
- The problem
- Our solution - sparc-me
- Impact
- Setting up sparc-me
- Using sparc-me
- Reporting issues
- Contributing
- Cite us
- FAIR practices
- License
- Team
- Acknowledgements
This is the repository of Team sparc-me (Team #7) of the 2022 SPARC Codeathon. Click here to find out more about the SPARC Codeathon 2022. Check out the Team Section of this page to find out more about our team members.
With the exception of high-level planning by the team lead as advised by the Codeathon organisers, no work was done on this project prior to the Codeathon. Contributions from existing projects are described in the Acknowledgements Section.
The NIH Common Fund program on Stimulating Peripheral Activity to Relieve Conditions (SPARC) focuses on understanding peripheral nerves (nerves that connect the brain and spinal cord to the rest of the body), how their electrical signals control internal organ function, and how therapeutic devices could be developed to modulate electrical activity in nerves to improve organ function. This may provide a potentially powerful way to treat a diverse set of common conditions and diseases such hypertension, heart failure, gastrointestinal disorders, and more. 60 research groups spanning 90 institutions and companies contribute to SPARC and work across over 15 organs and systems in 8 species.
The SPARC Portal provides a single user-facing online interface to all resources generated by the SPARC community that can be shared, cited, visualized, computed, and used for virtual experimentation. A key offering of the portal is the collection of well-curated, high-impact data that is being generated by SPARC-funded researchers. These datasets, along with other SPARC projects and computational simulations, can be found under the "Find Data" section of the SPARC Portal.
A SPARC dataset comprises the following data and structure:
- An experimental protocol that has been submitted to Protocols.io, shared with the SPARC working group, curated, and published with a valid DOI.
- Data files are organized into folders by the investigators and curated according to the SPARC Dataset Structure (SDS) and stored on the Pennsieve data management system. The SDS was adapted from the Brain Imaging Data Structure (BIDS) specification. Data organization and submission that is in compliance with the SDS is greatly simplified using a cross-platform opensource Software to Organize Data Automatically (SODA) through a step-by-step interactive graphical user interface (GUI).
Information regarding how to navigate a SPARC dataset and how a dataset is formatted can be found on the SPARC Portal.
There is currently no publicly available programmatic appraoch for:
- Accessing and interrogating all metadata fields in SDS datasets.
- Creating new SDS datasets (schemas for SDS dataset validation are not yet publicly available).
This limits the ability of members of the SPARC and the wider scientific community to apply FAIR principles for:
- Interacting with SDS datasets for conducting their research (limits accessibilty).
- Applying the SDS specification for storing and curating results from their instrumentation and computational physiology workflows (especially from automated workflows that can generate large quantities of data that may be impractical to store in SDS format using existing interactive tools like SODA) (limits interoperabilty).
- Proposing and supporting extensions to the SDS (similar to BIDS extensions) to further expand the SPARC community e.g. to enable storing clinical data (limits reusability).
- Quickly prototyping novel infrastructure/tools to elevate the impact of the SPARC program (limits application).
To address this problem, we have developed a python module called the SPARC Metadata Editor (sparc-me) that can be used to enhance the FAIRness of SPARC data by enabling:
- Findability
- Exploring data and metadata within SDS datasets
- Accessibility
- Accessing curated SDS datasets and their metadata (using the Pennsieve API)
- Accessing protocols used by existing SDS datasets (using the protocols.io API)
- Interoperability
- Conversion between BIDS datasets and SDS datasets
- Reusability
- Extending SDS descriptions/creating schemas1
Examples and guided tutorials have been created to demonstrate each of the features above.
sparc-me will elevate the impact of the SPARC program by providing the fundamental tools needed by users to programmatically interact with SDS datasets and efficiently build novel resources and tools from SPARC data. This includes:
- Supporting SPARC Data and Resource Centre (DRC) and communnity developments including:
- Assisting with efforts for automating SPARC data curation e.g. via realtime/on-the-fly dataset validation by users prior submission for curation.
- Improving efficiency of software developments (e.g. future codeathons and SPARC portal roadmap developments) by reducing the need to reimplement common functions.
- Supporting and promoting reuse/harmonisation/compatibility with other research initiatives. For example, sparc-me could be used to programatically map SDS descriptions to Gen3 data dictionaries used in other NIH-funded initiatives such as the Common Fund’s NIH Data Commons program.
- Enabling extensions of the SDS specification to be proposed/explored (similar to BIDS extensions). This will enable other initiatives to build upon the extensive and ground-breaking developments of the SPARC community e.g. for storing results from computational physiology workflows and digital twins that are being developed for precision medicine using the SDS specification.
- Git
- Python. Tested on:
- 3.8.6
- 3.9
Here is the link to our project on PyPI
pip install sparc-me
Clone the sparc-me repository from github, e.g.:
git clone [email protected]:SPARC-FAIR-Codeathon/sparc-me.git
-
Setting up virtual environment (optional but recommended). In this step, we will create a virtual environment in a new folder named venv, and activate the virtual environment.
- Linux
python3 -m venv venv source venv/bin/activate
- Windows
python3 -m venv venv venv\Scripts\activate
-
Installing dependencies via pip
pip install -r requirements.txt
Guided tutorials have been developed describing how to use sparc-me in different scenarios:
Tutorial | Description |
---|---|
1 | Downloading an existing curated SDS dataset (human whole-body computational scaffold with embedded organs), and use existing tools to query ontology terms that have been used to annotate SDS datasets using the SciCrunch knowledgebase. |
2 | Creating an SDS dataset programmatically from input data, editing metadata values and filtering metadata. |
3 | Interacting with SDS datasets on O2SPARC with sparc-me. |
4 | Creating an extension of the SDS to include an additional metadata field that defines data use descriptions from the GA4GH-approved Data Use Ontology (DUO). This tutorial is a first step toward demonstrating how the SDS could be extended to describe clinical data. |
5 | Converting a BIDS dataset to an SDS dataset. |
In additional to the tutorials, the following examples are also provided in the example folder to help highlight the functionality of sparc-me:
example_for_base_functionality.py
- Example outlining basic functionality for the loading/saving/editing of dataset/metadata.example_for_validating_schema.py
- Example showing how to validate SDS entries against the SDS schema stored in the/sparc_me/resources/templates/
folder for a given SDS version.example_for_listing_all_curated_datasets.py
- Example for listing all curated SPARC datasets from Pennsieve.example_for_accessing_dataset_protocol.py
- Example for retrieving the protocol for a curated SPARC dataset from protocosls.io.example_for_downloading_dataset_files.py
- Example for downloading files in curated SPARC datasets through the sparc-me API.
To report an issue or suggest a new feature, please use the issues page. Please check existing issues before submitting a new one.
Fork this repository and submit a pull request to contribute. Before doing so, please read our Code of Conduct and Contributing Guidelines. Please add a GitHub Star to support developments!
/sparc_me/
- Parent directory of sparc-me python module./sparc_me/core/
- Core classes of sparc-me./sparc_me/resources/templates/
- Location of SPARC dataset Structure templates./examples/
- Parent directory of sparc-me examples and tutorials./examples/test_data/
- Test data used for sparc-me examples and tutorials./docs/images/
- Images used in sparc-me tutorials.
If you use sparc-me to make new discoveries or use the source code, please cite us as follows:
Savindi Wijenayaka, Linkun Gao, Michael Hoffman, David Nickerson, Haribalan Kumar, Chinchien Lin, Thiranja Prasad Babarenda Gamage (2022). sparc-me: v1.0.0 - A python tool to explore, enhance, and expand SPARC datasets and their descriptions in accordance with FAIR principles.
Zenodo. https://doi.org/10.5281/zenodo.6975692.
We have assessed the FAIRness of our sparc-me tool against the FAIR Principles established for research software. The details are available in the following document.
sparc-me is fully open source and distributed under the very permissive Apache License 2.0. See LICENSE for more information.
- Savindi Wijenayaka (Developer, Writer - Documentation)
- Linkun Gao (Developer, Writer - Documentation)
- Michael Hoffman (Writer - Documentation)
- Haribalan Kumar (Developer, Writer - Documentation)
- Chinchien Lin (SysAdmin, Writer - Documentation)
- Thiranja Prasad Babarenda Gamage (Lead, Writer - Documentation)
- We would like to thank the organizers of the 2022 SPARC Codeathon for their guidance and support during this Codeathon.
- Initial code defining the loading of SDS datasets using python was adopted from https://github.com/ABI-CTT-Group/metadata-manager/releases/tag/v1.0.0 at the start of the project.
Footnotes
-
Please note that the schemas derived in the current version of sparc-me have been generated based on basic rules (e.g. required fields, data type etc). These will be replaced when an official schema is released by the SPARC curation team (elements of the internal schema used by the SPARC curators for curating SPARC datasets can be found here). ↩