[Feature request]: Incorporate metabolomics and proteomics data #91

nctoussaint · 2024-07-29T11:26:05Z

Contact Details

No response

Description

As discussed with the SMOC team, MODOs should be extended to include metabolomics and proteomics data.

Patrick Pedrioli and Nicola Zamboni have provided example data with metadata via a SWITCH drive ORDES provided:
https://drive.switch.ch/index.php/s/k1Tng5UjjqV6Eqf

Importance Level

High

Affected Components

No response

Technical Requirements

No response

Acceptance criteria

No response

cmdoret · 2024-07-31T11:10:17Z

Hey @htmonkey @ppedrioli,

We would love to have your input on this:

metadata

are there exclusive metadata fields between mztab and mztab-m that would warrant a separate class in the schema (example for context: modos:ReferenceGenome) ?
do you have a list of metadata fields that should be extracted in priority from mztab files for a first prototype?
Do you think modos:DataEntity could be extended (i.e. adding fields) to represent mztab, or do you recommend creating a completely separate object?
- Asking mainly because as far as I can tell a single mztab contains multiple arrays.
- One solution to this would be to have a grouping type, (DataCollection?) pointing to the individual arrays.

storage

We were thinking of storing the experiment-level metadata as json (in zarr), as we do for genomics, and the actual data in zarr arrays (which allows indexes, random access and compression). Does make sense?

interaction

We were thinking about something like

modos --endpoint http://s3.example.org add mztab s3://my-bucket/test-object ./exp1.mztab

Does that look OK to you?

We would be interested in the kind of "questions" you would then ask to the object. This would help us designing the command line interface, API and metadata schema. Suggestions are welcome.

htmonkey · 2024-08-20T09:17:23Z

Metadata

The mzTab is designed to accommodate different experimental designs (see Figure 1 on DOC . We need samples (6.2.19-6.2.25) and assay data (6.2.34 - 38), and they could be condensed into a single class that, for each assay (=columns in the data table), reports all key metadata to map to patients and describe the measurement.

Importantly: we need to store all sections (MTD, SML, SMF, SME, COM MGF... as defined by the code at the beginning of each line), even if you don't use classes for all. The current DataEntity doesn't seem to be suited to store these tables and lists and preserve links, but maybe I would need a specific example.

Storage
Seems ok to use zarr arrays, as long as you can preserve the complexity of the mzTab content. If not, we could just stick to storing everything as a json. Size and compression are less of an issue compared to genomics or proteomics data.

Interaction
Seems fine. The simpler, the better.

Questions
The priority is to pull all values in the SML table for the samples (=patients) of interest. Second, we would need the putative identity of each feature from the SMF table. The rest is nice to verify feature annotation.

nctoussaint added the enhancement New feature or request label Jul 29, 2024

nctoussaint assigned almutlue and cmdoret Jul 29, 2024

cmdoret mentioned this issue Oct 7, 2024

feat: mass spectrometry data support sdsc-ordes/modos-schema#18

Merged

cmdoret linked a pull request Oct 15, 2024 that will close this issue

feat: mztab support #107

Merged

cmdoret closed this as completed in #107 Nov 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature request]: Incorporate metabolomics and proteomics data #91

[Feature request]: Incorporate metabolomics and proteomics data #91

nctoussaint commented Jul 29, 2024

cmdoret commented Jul 31, 2024

htmonkey commented Aug 20, 2024

[Feature request]: Incorporate metabolomics and proteomics data #91

[Feature request]: Incorporate metabolomics and proteomics data #91

Comments

nctoussaint commented Jul 29, 2024

Contact Details

Description

Importance Level

Affected Components

Technical Requirements

Acceptance criteria

cmdoret commented Jul 31, 2024

htmonkey commented Aug 20, 2024