SPARC Metadata Editor (sparc-me)

A python tool to explore, enhance, and expand SPARC datasets and their descriptions in accordance with FAIR principles.

About

This is the repository of Team sparc-me (Team #7) of the 2022 SPARC Codeathon. Click here to find out more about the SPARC Codeathon 2022. Check out the Team Section of this page to find out more about our team members.

With the exception of high-level planning by the team lead as advised by the Codeathon organisers, no work was done on this project prior to the Codeathon. Contributions from existing projects are described in the Acknowledgements Section.

Introduction

The NIH Common Fund program on Stimulating Peripheral Activity to Relieve Conditions (SPARC) focuses on understanding peripheral nerves (nerves that connect the brain and spinal cord to the rest of the body), how their electrical signals control internal organ function, and how therapeutic devices could be developed to modulate electrical activity in nerves to improve organ function. This may provide a potentially powerful way to treat a diverse set of common conditions and diseases such hypertension, heart failure, gastrointestinal disorders, and more. 60 research groups spanning 90 institutions and companies contribute to SPARC and work across over 15 organs and systems in 8 species.

The SPARC Portal provides a single user-facing online interface to all resources generated by the SPARC community that can be shared, cited, visualized, computed, and used for virtual experimentation. A key offering of the portal is the collection of well-curated, high-impact data that is being generated by SPARC-funded researchers. These datasets, along with other SPARC projects and computational simulations, can be found under the "Find Data" section of the SPARC Portal.

A SPARC dataset comprises the following data and structure:

An experimental protocol that has been submitted to Protocols.io, shared with the SPARC working group, curated, and published with a valid DOI.
Data files are organized into folders by the investigators and curated according to the SPARC Dataset Structure (SDS) and stored on the Pennsieve data management system. The SDS was adapted from the Brain Imaging Data Structure (BIDS) specification. Data organization and submission that is in compliance with the SDS is greatly simplified using a cross-platform opensource Software to Organize Data Automatically (SODA) through a step-by-step interactive graphical user interface (GUI).

Information regarding how to navigate a SPARC dataset and how a dataset is formatted can be found on the SPARC Portal.

The problem

There is currently no publicly available programmatic appraoch for:

Accessing and interrogating all metadata fields in SDS datasets.
Creating new SDS datasets (schemas for SDS dataset validation are not yet publicly available).

This limits the ability of members of the SPARC and the wider scientific community to apply FAIR principles for:

Interacting with SDS datasets for conducting their research (limits accessibilty).
Applying the SDS specification for storing and curating results from their instrumentation and computational physiology workflows (especially from automated workflows that can generate large quantities of data that may be impractical to store in SDS format using existing interactive tools like SODA) (limits interoperabilty).
Proposing and supporting extensions to the SDS (similar to BIDS extensions) to further expand the SPARC community e.g. to enable storing clinical data (limits reusability).
Quickly prototyping novel infrastructure/tools to elevate the impact of the SPARC program (limits application).

Our solution - sparc-me

To address this problem, we have developed a python module called the SPARC Metadata Editor (sparc-me) that can be used to enhance the FAIRness of SPARC data by enabling:

Findability
- Exploring data and metadata within SDS datasets
Accessibility
- Accessing curated SDS datasets and their metadata (using the Pennsieve API)
- Accessing protocols used by existing SDS datasets (using the protocols.io API)
Interoperability
- Conversion between BIDS datasets and SDS datasets
Reusability
- Extending SDS descriptions/creating schemas¹

Examples and guided tutorials have been created to demonstrate each of the features above.

Impact

sparc-me will elevate the impact of the SPARC program by providing the fundamental tools needed by users to programmatically interact with SDS datasets and efficiently build novel resources and tools from SPARC data. This includes:

Supporting SPARC Data and Resource Centre (DRC) and communnity developments including:
- Assisting with efforts for automating SPARC data curation e.g. via realtime/on-the-fly dataset validation by users prior submission for curation.
- Improving efficiency of software developments (e.g. future codeathons and SPARC portal roadmap developments) by reducing the need to reimplement common functions.
Supporting and promoting reuse/harmonisation/compatibility with other research initiatives. For example, sparc-me could be used to programatically map SDS descriptions to Gen3 data dictionaries used in other NIH-funded initiatives such as the Common Fund’s NIH Data Commons program.
Enabling extensions of the SDS specification to be proposed/explored (similar to BIDS extensions). This will enable other initiatives to build upon the extensive and ground-breaking developments of the SPARC community e.g. for storing results from computational physiology workflows and digital twins that are being developed for precision medicine using the SDS specification.

Setting up sparc-me

Pre-requisites

Git
Python. Tested on:
- 3.8.6
- 3.9

PyPI

Here is the link to our project on PyPI

pip install sparc-me

From source code

Downloading source code

Clone the sparc-me repository from github, e.g.:

git clone [email protected]:SPARC-FAIR-Codeathon/sparc-me.git

Installing dependencies

Setting up virtual environment (optional but recommended). In this step, we will create a virtual environment in a new folder named venv, and activate the virtual environment.
- Linux
```
python3 -m venv venv
source venv/bin/activate
```
- Windows
```
python3 -m venv venv
venv\Scripts\activate
```
Installing dependencies via pip
```
pip install -r requirements.txt
```

Using sparc-me

Running tutorials

Guided tutorials have been developed describing how to use sparc-me in different scenarios:

Tutorial	Description
1	Downloading an existing curated SDS dataset (human whole-body computational scaffold with embedded organs), and use existing tools to query ontology terms that have been used to annotate SDS datasets using the SciCrunch knowledgebase.
2	Creating an SDS dataset programmatically from input data, editing metadata values and filtering metadata.
3	Interacting with SDS datasets on O2SPARC with sparc-me.
4	Creating an extension of the SDS to include an additional metadata field that defines data use descriptions from the GA4GH-approved Data Use Ontology (DUO). This tutorial is a first step toward demonstrating how the SDS could be extended to describe clinical data.
5	Converting a BIDS dataset to an SDS dataset.

Running examples

In additional to the tutorials, the following examples are also provided in the example folder to help highlight the functionality of sparc-me:

example_for_base_functionality.py - Example outlining basic functionality for the loading/saving/editing of dataset/metadata.
example_for_validating_schema.py - Example showing how to validate SDS entries against the SDS schema stored in the /sparc_me/resources/templates/ folder for a given SDS version.
example_for_listing_all_curated_datasets.py - Example for listing all curated SPARC datasets from Pennsieve.
example_for_accessing_dataset_protocol.py - Example for retrieving the protocol for a curated SPARC dataset from protocosls.io.
example_for_downloading_dataset_files.py - Example for downloading files in curated SPARC datasets through the sparc-me API.

Reporting issues

To report an issue or suggest a new feature, please use the issues page. Please check existing issues before submitting a new one.

Contributing

Fork this repository and submit a pull request to contribute. Before doing so, please read our Code of Conduct and Contributing Guidelines. Please add a GitHub Star to support developments!

Project structure

/sparc_me/ - Parent directory of sparc-me python module.
/sparc_me/core/ - Core classes of sparc-me.
/sparc_me/resources/templates/ - Location of SPARC dataset Structure templates.
/examples/ - Parent directory of sparc-me examples and tutorials.
/examples/test_data/ - Test data used for sparc-me examples and tutorials.
/docs/images/ - Images used in sparc-me tutorials.

Cite us

If you use sparc-me to make new discoveries or use the source code, please cite us as follows:

Savindi Wijenayaka, Linkun Gao, Michael Hoffman, David Nickerson, Haribalan Kumar, Chinchien Lin, Thiranja Prasad Babarenda Gamage (2022). sparc-me: v1.0.0 - A python tool to explore, enhance, and expand SPARC datasets and their descriptions in accordance with FAIR principles. 
Zenodo. https://doi.org/10.5281/zenodo.6975692.

FAIR practices

We have assessed the FAIRness of our sparc-me tool against the FAIR Principles established for research software. The details are available in the following document.

License

sparc-me is fully open source and distributed under the very permissive Apache License 2.0. See LICENSE for more information.

Team

Savindi Wijenayaka (Developer, Writer - Documentation)
Linkun Gao (Developer, Writer - Documentation)
Michael Hoffman (Writer - Documentation)
Haribalan Kumar (Developer, Writer - Documentation)
Chinchien Lin (SysAdmin, Writer - Documentation)
Thiranja Prasad Babarenda Gamage (Lead, Writer - Documentation)

Acknowledgements

We would like to thank the organizers of the 2022 SPARC Codeathon for their guidance and support during this Codeathon.
Initial code defining the loading of SDS datasets using python was adopted from https://github.com/ABI-CTT-Group/metadata-manager/releases/tag/v1.0.0 at the start of the project.

Please note that the schemas derived in the current version of sparc-me have been generated based on basic rules (e.g. required fields, data type etc). These will be replaced when an official schema is released by the SPARC curation team (elements of the internal schema used by the SPARC curators for curating SPARC datasets can be found here). ↩

Name		Name	Last commit message	Last commit date
Latest commit History 436 Commits
.github		.github
docs		docs
examples		examples
sparc_me		sparc_me
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
.zenodo.json		.zenodo.json
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SPARC Metadata Editor (sparc-me)

Table of contents

About

Introduction

The problem

Our solution - sparc-me

Impact

Setting up sparc-me

Pre-requisites

PyPI

From source code

Downloading source code

Installing dependencies

Using sparc-me

Running tutorials

Running examples

Reporting issues

Contributing

Project structure

Cite us

FAIR practices

License

Team

Acknowledgements

About

Releases 2

Packages

Contributors 9

Languages

License

SPARC-FAIR-Codeathon/sparc-me

Folders and files

Latest commit

History

Repository files navigation

SPARC Metadata Editor (sparc-me)

Table of contents

About

Introduction

The problem

Our solution - sparc-me

Impact

Setting up sparc-me

Pre-requisites

PyPI

From source code

Downloading source code

Installing dependencies

Using sparc-me

Running tutorials

Running examples

Reporting issues

Contributing

Project structure

Cite us

FAIR practices

License

Team

Acknowledgements

Footnotes

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 9

Languages

Packages