Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request]: Incorporate metabolomics and proteomics data #91

Closed
nctoussaint opened this issue Jul 29, 2024 · 2 comments · Fixed by #107
Closed

[Feature request]: Incorporate metabolomics and proteomics data #91

nctoussaint opened this issue Jul 29, 2024 · 2 comments · Fixed by #107
Assignees
Labels
enhancement New feature or request

Comments

@nctoussaint
Copy link

Contact Details

No response

Description

As discussed with the SMOC team, MODOs should be extended to include metabolomics and proteomics data.

Patrick Pedrioli and Nicola Zamboni have provided example data with metadata via a SWITCH drive ORDES provided:
https://drive.switch.ch/index.php/s/k1Tng5UjjqV6Eqf

Importance Level

High

Affected Components

No response

Technical Requirements

No response

Acceptance criteria

No response

@nctoussaint nctoussaint added the enhancement New feature or request label Jul 29, 2024
@cmdoret
Copy link
Member

cmdoret commented Jul 31, 2024

Hey @htmonkey @ppedrioli,

We would love to have your input on this:

metadata

  • are there exclusive metadata fields between mztab and mztab-m that would warrant a separate class in the schema (example for context: modos:ReferenceGenome) ?
  • do you have a list of metadata fields that should be extracted in priority from mztab files for a first prototype?
  • Do you think modos:DataEntity could be extended (i.e. adding fields) to represent mztab, or do you recommend creating a completely separate object?
    • Asking mainly because as far as I can tell a single mztab contains multiple arrays.
    • One solution to this would be to have a grouping type, (DataCollection?) pointing to the individual arrays.

storage

  • We were thinking of storing the experiment-level metadata as json (in zarr), as we do for genomics, and the actual data in zarr arrays (which allows indexes, random access and compression). Does make sense?

interaction

We were thinking about something like

modos --endpoint http://s3.example.org add mztab s3://my-bucket/test-object ./exp1.mztab

Does that look OK to you?

We would be interested in the kind of "questions" you would then ask to the object. This would help us designing the command line interface, API and metadata schema. Suggestions are welcome.

@htmonkey
Copy link

Metadata

The mzTab is designed to accommodate different experimental designs (see Figure 1 on DOC . We need samples (6.2.19-6.2.25) and assay data (6.2.34 - 38), and they could be condensed into a single class that, for each assay (=columns in the data table), reports all key metadata to map to patients and describe the measurement.

Importantly: we need to store all sections (MTD, SML, SMF, SME, COM MGF... as defined by the code at the beginning of each line), even if you don't use classes for all. The current DataEntity doesn't seem to be suited to store these tables and lists and preserve links, but maybe I would need a specific example.

Storage
Seems ok to use zarr arrays, as long as you can preserve the complexity of the mzTab content. If not, we could just stick to storing everything as a json. Size and compression are less of an issue compared to genomics or proteomics data.

Interaction
Seems fine. The simpler, the better.

Questions
The priority is to pull all values in the SML table for the samples (=patients) of interest. Second, we would need the putative identity of each feature from the SMF table. The rest is nice to verify feature annotation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants