Skip to content

A python tool to explore, enhance, and expand SPARC datasets and their descriptions

License

Notifications You must be signed in to change notification settings

SPARC-FAIR-Codeathon/sparc-me

Repository files navigation

SPARC Metadata Editor (sparc-me)

A python tool to explore, enhance, and expand SPARC datasets and their descriptions in accordance with FAIR principles.

Contributors Stargazers GitHub issues-closed Issues MIT License Contributor Covenant DOI PyPI version fury.io

Table of contents

About

This is the repository of Team sparc-me (Team #7) of the 2022 SPARC Codeathon. Click here to find out more about the SPARC Codeathon 2022. Check out the Team Section of this page to find out more about our team members.

With the exception of high-level planning by the team lead as advised by the Codeathon organisers, no work was done on this project prior to the Codeathon. Contributions from existing projects are described in the Acknowledgements Section.

Introduction

The NIH Common Fund program on Stimulating Peripheral Activity to Relieve Conditions (SPARC) focuses on understanding peripheral nerves (nerves that connect the brain and spinal cord to the rest of the body), how their electrical signals control internal organ function, and how therapeutic devices could be developed to modulate electrical activity in nerves to improve organ function. This may provide a potentially powerful way to treat a diverse set of common conditions and diseases such hypertension, heart failure, gastrointestinal disorders, and more. 60 research groups spanning 90 institutions and companies contribute to SPARC and work across over 15 organs and systems in 8 species.

The SPARC Portal provides a single user-facing online interface to all resources generated by the SPARC community that can be shared, cited, visualized, computed, and used for virtual experimentation. A key offering of the portal is the collection of well-curated, high-impact data that is being generated by SPARC-funded researchers. These datasets, along with other SPARC projects and computational simulations, can be found under the "Find Data" section of the SPARC Portal.

A SPARC dataset comprises the following data and structure:

Information regarding how to navigate a SPARC dataset and how a dataset is formatted can be found on the SPARC Portal.

The problem

There is currently no publicly available programmatic appraoch for:

  • Accessing and interrogating all metadata fields in SDS datasets.
  • Creating new SDS datasets (schemas for SDS dataset validation are not yet publicly available).

This limits the ability of members of the SPARC and the wider scientific community to apply FAIR principles for:

  • Interacting with SDS datasets for conducting their research (limits accessibilty).
  • Applying the SDS specification for storing and curating results from their instrumentation and computational physiology workflows (especially from automated workflows that can generate large quantities of data that may be impractical to store in SDS format using existing interactive tools like SODA) (limits interoperabilty).
  • Proposing and supporting extensions to the SDS (similar to BIDS extensions) to further expand the SPARC community e.g. to enable storing clinical data (limits reusability).
  • Quickly prototyping novel infrastructure/tools to elevate the impact of the SPARC program (limits application).

Our solution - sparc-me

To address this problem, we have developed a python module called the SPARC Metadata Editor (sparc-me) that can be used to enhance the FAIRness of SPARC data by enabling:

  • Findability
    • Exploring data and metadata within SDS datasets
  • Accessibility
    • Accessing curated SDS datasets and their metadata (using the Pennsieve API)
    • Accessing protocols used by existing SDS datasets (using the protocols.io API)
  • Interoperability
    • Conversion between BIDS datasets and SDS datasets
  • Reusability
    • Extending SDS descriptions/creating schemas1

Examples and guided tutorials have been created to demonstrate each of the features above.

Impact

sparc-me will elevate the impact of the SPARC program by providing the fundamental tools needed by users to programmatically interact with SDS datasets and efficiently build novel resources and tools from SPARC data. This includes:

Setting up sparc-me

Pre-requisites

  • Git
  • Python. Tested on:
    • 3.8.6
    • 3.9

PyPI

Here is the link to our project on PyPI

pip install sparc-me

From source code

Downloading source code

Clone the sparc-me repository from github, e.g.:

git clone [email protected]:SPARC-FAIR-Codeathon/sparc-me.git

Installing dependencies

  1. Setting up virtual environment (optional but recommended). In this step, we will create a virtual environment in a new folder named venv, and activate the virtual environment.

    • Linux
    python3 -m venv venv
    source venv/bin/activate
    
    • Windows
    python3 -m venv venv
    venv\Scripts\activate
    
  2. Installing dependencies via pip

    pip install -r requirements.txt
    

Using sparc-me

Running tutorials

Guided tutorials have been developed describing how to use sparc-me in different scenarios:

Tutorial Description
1 Downloading an existing curated SDS dataset (human whole-body computational scaffold with embedded organs), and use existing tools to query ontology terms that have been used to annotate SDS datasets using the SciCrunch knowledgebase.
2 Creating an SDS dataset programmatically from input data, editing metadata values and filtering metadata.
3 Interacting with SDS datasets on O2SPARC with sparc-me.
4 Creating an extension of the SDS to include an additional metadata field that defines data use descriptions from the GA4GH-approved Data Use Ontology (DUO). This tutorial is a first step toward demonstrating how the SDS could be extended to describe clinical data.
5 Converting a BIDS dataset to an SDS dataset.


Running examples

In additional to the tutorials, the following examples are also provided in the example folder to help highlight the functionality of sparc-me:

  • example_for_base_functionality.py - Example outlining basic functionality for the loading/saving/editing of dataset/metadata.
  • example_for_validating_schema.py - Example showing how to validate SDS entries against the SDS schema stored in the /sparc_me/resources/templates/ folder for a given SDS version.
  • example_for_listing_all_curated_datasets.py - Example for listing all curated SPARC datasets from Pennsieve.
  • example_for_accessing_dataset_protocol.py - Example for retrieving the protocol for a curated SPARC dataset from protocosls.io.
  • example_for_downloading_dataset_files.py - Example for downloading files in curated SPARC datasets through the sparc-me API.

Reporting issues

To report an issue or suggest a new feature, please use the issues page. Please check existing issues before submitting a new one.

Contributing

Fork this repository and submit a pull request to contribute. Before doing so, please read our Code of Conduct and Contributing Guidelines. Please add a GitHub Star to support developments!

Project structure

  • /sparc_me/ - Parent directory of sparc-me python module.
  • /sparc_me/core/ - Core classes of sparc-me.
  • /sparc_me/resources/templates/ - Location of SPARC dataset Structure templates.
  • /examples/ - Parent directory of sparc-me examples and tutorials.
  • /examples/test_data/ - Test data used for sparc-me examples and tutorials.
  • /docs/images/ - Images used in sparc-me tutorials.

Cite us

If you use sparc-me to make new discoveries or use the source code, please cite us as follows:

Savindi Wijenayaka, Linkun Gao, Michael Hoffman, David Nickerson, Haribalan Kumar, Chinchien Lin, Thiranja Prasad Babarenda Gamage (2022). sparc-me: v1.0.0 - A python tool to explore, enhance, and expand SPARC datasets and their descriptions in accordance with FAIR principles. 
Zenodo. https://doi.org/10.5281/zenodo.6975692.

FAIR practices

We have assessed the FAIRness of our sparc-me tool against the FAIR Principles established for research software. The details are available in the following document.

License

sparc-me is fully open source and distributed under the very permissive Apache License 2.0. See LICENSE for more information.

Team

Acknowledgements

Footnotes

  1. Please note that the schemas derived in the current version of sparc-me have been generated based on basic rules (e.g. required fields, data type etc). These will be replaced when an official schema is released by the SPARC curation team (elements of the internal schema used by the SPARC curators for curating SPARC datasets can be found here).

About

A python tool to explore, enhance, and expand SPARC datasets and their descriptions

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Languages