Skip to content

Commit

Permalink
Merge branch 'master' into master
Browse files Browse the repository at this point in the history
  • Loading branch information
chinchien-lin authored Aug 8, 2023
2 parents 9efd477 + 6ec0eb3 commit 30fd4c7
Show file tree
Hide file tree
Showing 13 changed files with 411 additions and 363 deletions.
24 changes: 12 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,16 +54,16 @@ The NIH Common Fund program on **[Stimulating Peripheral Activity to Relieve Con
There is **currently no option for users to**:
- **easily describe workflows and tools, which process SPARC data, in a FAIR manner**
- **easily run such workflows locally or from cloud computing platforms such as oSPARC**
- **ensure reproducibility of workflow results**
- **easily reuse tools developed for processing SPARC data in new workflows** (tools are currently bundled within and tailored to specific SPARC datasets).
- **easily reproduce workflow results**
- **reuse tools developed for processing SPARC data in new workflows** (tools are currently bundled within and tailored to specific SPARC datasets).

## Our solution - sparc-flow
To address this problem, we have **developed a Python module called the SPARC Flow (sparc-flow)** that can be used to describe tools and workflows for processing SPARC datasets in accordance with FAIR principles:
- Provides an easy-to-use python-based application programming interface (API) to enable **tools** and **workflows** to be **described in a language agnostic manner**.
- Enables the parameters used for running workflows to be stored with the workflow description along with a copy of its associated tools to **facilitate interoperability and the reproducibility of workflow results**.
- Enables **workflows and tools** to be **independently stored in SDS datasets**, **ready to be contributed to the SPARC portal** to enable reuse by others.
- Provides the to **save and load workflows and tools directly from/to SDS datasets** using [sparc-me](https://github.com/SPARC-FAIR-Codeathon/sparc-me).
- Provides the abilty to **run workflows**:
- Enables the parameters used for running workflows to be stored with the standardised workflow description along with a copy of its associated tools to **enable workflow results to be easily reproduced**.
- Enables **workflows and tool descriptions** to be **independently stored in SDS datasets**, **ready to be contributed to the SPARC portal** to enable reuse by others.
- Provides the ability to **save and load workflows and tools directly from/to SDS datasets** via [sparc-me](https://github.com/SPARC-FAIR-Codeathon/sparc-me).
- Provides the ability to **run workflows**:
- locally;
- on existing cloud computing platforms such as [oSPARC](https://osparc.io/); or
- help prepare the workflow to be submitted to Dockstore to enable using its [standardised workflow interfaces](https://docs.dockstore.org/en/stable/advanced-topics/wes/cli-wes-tutorial.html) to run them directly from the commandline or through existing cloud computing platforms from [Dockstore.org](dockstore.org) (currently supports running on [AnVIL](https://anvilproject.org), [Cavatica](https://www.cavatica.org), [CGC](https://www.cancergenomicscloud.org), [DNAnexus](https://www.dnanexus.com), [Galaxy](https://usegalaxy.org), [Nextflow Tower](https://seqera.io/tower), and [Terra](https://terra.bio)).
Expand Down Expand Up @@ -162,38 +162,38 @@ Guided Jupyter Notebook tutorials have been developed describing how to use spar
</thead>
<tbody>
<tr>
<td><a href="https://github.com/SPARC-FAIR-Codeathon/sparc-flow/blob/main/tutorials/tutorial_1_download_data_and_postprocess.ipynb">
<td><a href="https://github.com/SPARC-FAIR-Codeathon/2023-team-3/blob/master/tutorials/tutorial_1/tutorial_1_download_data_and_postprocess.ipynb">
1
</a></td>
<td> Provides a typical data processing example that downloads an existing curated SDS dataset from the SPARC portal (<a href="https://doi.org/10.26275/vm1h-k4kq">Electrode design characterization for electrophysiology from swine peripheral nervous system</a>) using <a href="https://github.com/SPARC-FAIR-Codeathon/sparc-me">sparc-me</a> and perform post-processing to generate a new derived SDS dataset. This example will be used in subsequent tutorials</td>
</tr>
<tr>
<td><a href="https://github.com/SPARC-FAIR-Codeathon/2023-team-3/blob/main/examples/tutorial_2_creating_standarised_workflow_description.ipynb">
<td><a href="https://github.com/SPARC-FAIR-Codeathon/2023-team-3/blob/master/tutorials/tutorial_2/tutorial_2_creating_standarised_workflow_description.ipynb">
2
</a></td>
<td> Use sparc-flow to programmatically describe the example in Tutorial 1 in a standard workflow language (Common Workflow Language). This tutorial incorporates <a href="https://docs.google.com/document/d/1PKpl4WZ171C7YlQtG4AQ0WuK1bIFDGD6ys9PCnap_xI/edit">best practice guidelines</a> to ensure tools used in the workflow and the workflow itself are FAIR.
</td>
</tr>
<tr>
<td><a href="https://github.com/SPARC-FAIR-Codeathon/2023-team-3/blob/main/examples/tutorial_3_running_locally_with_cwltool.ipynb">
<td><a href="https://github.com/SPARC-FAIR-Codeathon/2023-team-3/blob/master/tutorials/tutorial_3/tutorial_3_running_locally_with_cwltool.ipynb">
3
</a></td>
<td> Use sparc-flow to run the standardised workflow described in Tutorial 2 locally using cwltool (reference implementation provided by the CWL Organisation).</td>
</tr>
<tr>
<td><a href="https://github.com/SPARC-FAIR-Codeathon/2023-team-3/blob/main/examples/tutorial_4_running_locally_with_docstore.ipynb">
<td><a href="https://github.com/SPARC-FAIR-Codeathon/2023-team-3/blob/master/tutorials/tutorial_4/tutorial_4_running_locally_with_docstore.ipynb">
4
</a></td>
<td> Use sparc-flow to run the standardised workflow described in Tutorial 2 locally using Dockstore.</td>
</tr>
<tr>
<td><a href="https://github.com/SPARC-FAIR-Codeathon/2023-team-3/blob/main/examples/tutorial_5_running_on_dockstore_compatiable_cloud.ipynb">
<td><a href="https://github.com/SPARC-FAIR-Codeathon/2023-team-3/blob/master/tutorials/tutorial_5/tutorial_5_running_on_dockstore_compatiable_cloud.ipynb">
5
</a></td>
<td> Use sparc-flow to run the standardised workflow described in Tutorial 2 via the cloud using a Dockstore-compatible cloud computing platform (e.g. AnVIL, Cavatica, CGC, DNAnexus, Galaxy, Nextflow Tower, and Terra).</td>
</tr>
<tr>
<td><a href="https://github.com/SPARC-FAIR-Codeathon/2023-team-3/blob/main/examples/tutorial_6_running_on_oSPARC.ipynb">
<td><a href="https://github.com/SPARC-FAIR-Codeathon/2023-team-3/blob/master/tutorials/tutorial_6/tutorial_6_osparc.ipynb">
6
</a></td>
<td> Use sparc-flow to run the standardised workflow described in Tutorial 2 on oSPARC.</td>
Expand Down
2 changes: 1 addition & 1 deletion examples/sparc_flow_api_generatesds/generate_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,4 +17,4 @@

workflow.run(runner="dockstore")

workflow.generate_dockstore_github_requirements("/workflow.cwl", res["workflow_path"], ["/inp_job.yml"],"Linkun Gao", "[email protected]")
workflow.generate_dockstore_github_requirements("/workflow.cwl", res["workflow_path"], ["/inp_job.yml"],"Your name", "[email protected]")
16 changes: 8 additions & 8 deletions resources/workflow_dataset/primary/workflow/.dockstore.yml
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
version: 1.2
version: '1.2'
workflows:
- subclass: CWL
primaryDescriptorPath: /Dockstore.cwl
testParameterFiles:
- /inp_job.yml
authors:
- name: Linkun Gao
email: [email protected]
- authors:
- email: ''
name: [email protected]
primaryDescriptorPath: /workflow.cwl
subclass: Linkun Gao
testParameterFiles:
- /inp_job.yml
6 changes: 5 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,12 @@
from setuptools import setup, find_packages
from pathlib import Path

setup(
name="sparc_flow",
version="0.1.0",
version="1.0.2",
description='A Python tool to create tools and workflows for processing SPARC datasets in accordance with FAIR principles.',
long_description=(Path(__file__).parent / "README.md").read_text(),
long_description_content_type="text/markdown",
author="Thiranja Prasad Babarenda Gamage, Chinchien Lin, Jiali Xu, Linkun Gao, Matthew French, Michael Hoffman",
email="[email protected], [email protected]",
license="Apache-2.0",
Expand Down
Empty file.
2 changes: 1 addition & 1 deletion tutorials/tutorial_2/tools/sparc_data_tool.cwl
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: CommandLineTool
baseCommand: ['python', '-m', 'examples.sparc_workflow_example.tools.sparc_data_tool']
baseCommand: ['python', '-m', 'tools.sparc_data_tool']

inputs:
number:
Expand Down
4 changes: 3 additions & 1 deletion tutorials/tutorial_2/tools/sparc_data_tool.py
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,9 @@ def get_sparc_dataset_and_process(number):
plt.ylabel('Voltage (uV)', fontsize=20)
plt.xticks(fontsize=20)
plt.yticks(fontsize=20)
plt.show()
plt.show()

fig.savefig("output.png")

with open("output.txt", 'w') as f:
f.write(str(time))
Expand Down
16 changes: 0 additions & 16 deletions tutorials/tutorial_2/tools/workflow.cwl

This file was deleted.

9 changes: 0 additions & 9 deletions tutorials/tutorial_2/tools/workflow.py

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,7 @@
"metadata": {},
"source": [
"# Tutorial 2: Creating standardised workflow description\n",
"This tutorial shows how to create a standardised workflow (CWL) description using the sparc_flow package for the same operations as seen in tutorial 1. At each step of the process the CWL file is saved within a SDS structure in keeping with FAIR principles. \n",
"\n",
"To ensure our workflow adheres to the FAIR principles:\n",
"- We published our workflow, along with the metadata, on Dockstore. This makes it easily findable for those interested in using it.\n",
"- We chose Dockerstore for publishing our workflow to promote its accessibility as Dockstore does not require users to sign in to search for published content. Furthermore, we obtained a Digital Object Identifier (DOI) for our workflow via Zenodo through Dockstore, ensuring it can be readily accessed by interested parties.\n",
"- We employed CWL to describe our workflow to ensure our workflow is interoperable. CWL is designed to describe workflows in an environment-agnostic and portable way, making them compatible across various platforms including Dockstore. We also provided a parameter file (JSON) contaiining example parameters for lauching our workflow.\n",
"- We provided a thorough README in the git repository to ensure the workflow's reusability. Our sparc-flow module is fully open source and distributed under the very permissive Apache License 2.0 as stated in the README. \n",
"\n",
"This tutorial shows how to create a standardised work flow (CWL) description using the sparc_flow package for the same operations as seen in tutorial 1. At each step of the process the CWL file is saved within a SDS structure in keeping with FAIR principles. \n",
"\n",
"## Requirements\n",
"pip install sparc_flow"
Expand Down Expand Up @@ -50,28 +43,32 @@
"metadata": {},
"outputs": [
{
"ename": "FileNotFoundError",
"evalue": "[Errno 2] No such file or directory: 'tutorial_2/tools/sparc_data_tool.cwl'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mFileNotFoundError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[2], line 8\u001b[0m\n\u001b[1;32m 6\u001b[0m tool\u001b[39m.\u001b[39mset_output_type(\u001b[39m\"\u001b[39m\u001b[39mFile\u001b[39m\u001b[39m\"\u001b[39m)\n\u001b[1;32m 7\u001b[0m tool\u001b[39m.\u001b[39mset_output_path(\u001b[39m\"\u001b[39m\u001b[39moutput.txt\u001b[39m\u001b[39m\"\u001b[39m)\n\u001b[0;32m----> 8\u001b[0m tool\u001b[39m.\u001b[39;49mgenerate_description()\n\u001b[1;32m 10\u001b[0m \u001b[39m# tool.generate_sds()\u001b[39;00m\n\u001b[1;32m 11\u001b[0m \u001b[39m## *This should save to new SDS in resources folder. Akk stored in promary\u001b[39;00m\n",
"File \u001b[0;32m~/src/2023-team-3/sparc_flow/core/workflow.py:190\u001b[0m, in \u001b[0;36mTool.generate_description\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 171\u001b[0m \u001b[39mdef\u001b[39;00m \u001b[39mgenerate_description\u001b[39m(\u001b[39mself\u001b[39m):\n\u001b[1;32m 172\u001b[0m description \u001b[39m=\u001b[39m \u001b[39mf\u001b[39m\u001b[39m\"\"\"\u001b[39m\u001b[39m#!/usr/bin/env cwl-runner\u001b[39m\n\u001b[1;32m 173\u001b[0m \u001b[39mcwlVersion: v1.0\u001b[39m\n\u001b[1;32m 174\u001b[0m \u001b[39mclass: CommandLineTool\u001b[39m\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 187\u001b[0m \u001b[39m glob: \u001b[39m\u001b[39m{\u001b[39;00m\u001b[39mself\u001b[39m\u001b[39m.\u001b[39moutput_path\u001b[39m}\u001b[39;00m\n\u001b[1;32m 188\u001b[0m \u001b[39m \u001b[39m\u001b[39m\"\"\"\u001b[39m\n\u001b[0;32m--> 190\u001b[0m \u001b[39mwith\u001b[39;00m \u001b[39mopen\u001b[39;49m(\u001b[39mf\u001b[39;49m\u001b[39m'\u001b[39;49m\u001b[39m{\u001b[39;49;00m\u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49mtool_dir\u001b[39m}\u001b[39;49;00m\u001b[39m/\u001b[39;49m\u001b[39m{\u001b[39;49;00m\u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49mtool_name\u001b[39m}\u001b[39;49;00m\u001b[39m.cwl\u001b[39;49m\u001b[39m'\u001b[39;49m, \u001b[39m'\u001b[39;49m\u001b[39mw\u001b[39;49m\u001b[39m'\u001b[39;49m) \u001b[39mas\u001b[39;00m f:\n\u001b[1;32m 191\u001b[0m f\u001b[39m.\u001b[39mwrite(description)\n",
"\u001b[0;31mFileNotFoundError\u001b[0m: [Errno 2] No such file or directory: 'tutorial_2/tools/sparc_data_tool.cwl'"
"name": "stderr",
"output_type": "stream",
"text": [
"/home/mfre190/anaconda3/envs/sparc/lib/python3.9/site-packages/sparc_me/core/dataset.py:260: FutureWarning:save is not part of the public API, usage can give unexpected results and will be removed in a future version\n",
"/home/mfre190/anaconda3/envs/sparc/lib/python3.9/site-packages/sparc_me/core/dataset.py:260: FutureWarning:save is not part of the public API, usage can give unexpected results and will be removed in a future version\n",
"/home/mfre190/anaconda3/envs/sparc/lib/python3.9/site-packages/sparc_me/core/dataset.py:260: FutureWarning:save is not part of the public API, usage can give unexpected results and will be removed in a future version\n",
"/home/mfre190/anaconda3/envs/sparc/lib/python3.9/site-packages/sparc_me/core/dataset.py:260: FutureWarning:save is not part of the public API, usage can give unexpected results and will be removed in a future version\n",
"/home/mfre190/anaconda3/envs/sparc/lib/python3.9/site-packages/sparc_me/core/dataset.py:260: FutureWarning:save is not part of the public API, usage can give unexpected results and will be removed in a future version\n",
"/home/mfre190/anaconda3/envs/sparc/lib/python3.9/site-packages/sparc_me/core/dataset.py:260: FutureWarning:save is not part of the public API, usage can give unexpected results and will be removed in a future version\n",
"/home/mfre190/anaconda3/envs/sparc/lib/python3.9/site-packages/sparc_me/core/dataset.py:260: FutureWarning:save is not part of the public API, usage can give unexpected results and will be removed in a future version\n",
"/home/mfre190/anaconda3/envs/sparc/lib/python3.9/site-packages/sparc_me/core/dataset.py:260: FutureWarning:save is not part of the public API, usage can give unexpected results and will be removed in a future version\n",
"/home/mfre190/anaconda3/envs/sparc/lib/python3.9/site-packages/sparc_me/core/dataset.py:260: FutureWarning:save is not part of the public API, usage can give unexpected results and will be removed in a future version\n",
"/home/mfre190/anaconda3/envs/sparc/lib/python3.9/site-packages/sparc_me/core/dataset.py:260: FutureWarning:save is not part of the public API, usage can give unexpected results and will be removed in a future version\n"
]
}
],
"source": [
"\n",
"tool = sparc_flow.Tool() \n",
"tool.set_tool_name(\"sparc_data_tool\")\n",
"tool.set_tool_dir(\"../tutorial_2/tools\")\n",
"tool.set_command([\"python\", \"-m\", \"examples.sparc_workflow_example.tools.sparc_data_tool\"])\n",
"tool.set_command([\"python\", \"-m\", \"tools.sparc_data_tool\"])\n",
"tool.set_input_type(\"int\")\n",
"tool.set_output_type(\"File\")\n",
"tool.set_output_path(\"output.txt\")\n",
"tool.generate_description()\n",
"tool.generate_description() \n",
"tool.create_sds(\"../resources/tutorial_2_resources/tools_dataset\", \"tools/\")\n",
"\n",
"# tool.generate_sds()\n",
"## *This should save to new SDS in resources folder. Akk stored in promary"
Expand All @@ -89,18 +86,33 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 3,
"id": "799de4f9",
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/mfre190/anaconda3/envs/sparc/lib/python3.9/site-packages/sparc_me/core/dataset.py:260: FutureWarning:save is not part of the public API, usage can give unexpected results and will be removed in a future version\n",
"/home/mfre190/anaconda3/envs/sparc/lib/python3.9/site-packages/sparc_me/core/dataset.py:260: FutureWarning:save is not part of the public API, usage can give unexpected results and will be removed in a future version\n",
"/home/mfre190/anaconda3/envs/sparc/lib/python3.9/site-packages/sparc_me/core/dataset.py:260: FutureWarning:save is not part of the public API, usage can give unexpected results and will be removed in a future version\n",
"/home/mfre190/anaconda3/envs/sparc/lib/python3.9/site-packages/sparc_me/core/dataset.py:260: FutureWarning:save is not part of the public API, usage can give unexpected results and will be removed in a future version\n",
"/home/mfre190/anaconda3/envs/sparc/lib/python3.9/site-packages/sparc_me/core/dataset.py:260: FutureWarning:save is not part of the public API, usage can give unexpected results and will be removed in a future version\n",
"/home/mfre190/anaconda3/envs/sparc/lib/python3.9/site-packages/sparc_me/core/dataset.py:260: FutureWarning:save is not part of the public API, usage can give unexpected results and will be removed in a future version\n",
"/home/mfre190/anaconda3/envs/sparc/lib/python3.9/site-packages/sparc_me/core/dataset.py:260: FutureWarning:save is not part of the public API, usage can give unexpected results and will be removed in a future version\n",
"/home/mfre190/anaconda3/envs/sparc/lib/python3.9/site-packages/sparc_me/core/dataset.py:260: FutureWarning:save is not part of the public API, usage can give unexpected results and will be removed in a future version\n",
"/home/mfre190/anaconda3/envs/sparc/lib/python3.9/site-packages/sparc_me/core/dataset.py:260: FutureWarning:save is not part of the public API, usage can give unexpected results and will be removed in a future version\n",
"/home/mfre190/anaconda3/envs/sparc/lib/python3.9/site-packages/sparc_me/core/dataset.py:260: FutureWarning:save is not part of the public API, usage can give unexpected results and will be removed in a future version\n"
]
}
],
"source": [
"workflow = sparc_flow.Workflow()\n",
"workflow.set_steps(tool_path=\"../examples/sparc_flow_example/tools\") \n",
"workflow.set_steps(tool_path=\"../tutorial_2/tools\") \n",
"workflow.set_input_value(input_value = 262, input_name = \"number\", input_type = \"int\")\n",
"workflow.generate_description() \n",
"\n",
"# workflow.generate_sds()\n",
"## *This should save to separate SDS in ../../resources/, and grab the required tools from the tools sds. All stored in primary\n"
"workflow.create_sds(\"../resources/tutorial_2_resources/workflow_dataset\", \"../tutorial_2\") \n"
]
}
],
Expand Down
Binary file added tutorials/tutorial_3/output.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 30fd4c7

Please sign in to comment.