Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identify the input formats for single-cell data we want to support in Gemma #1028

Closed
arteymix opened this issue Feb 12, 2024 · 4 comments · May be fixed by #1020
Closed

Identify the input formats for single-cell data we want to support in Gemma #1028

arteymix opened this issue Feb 12, 2024 · 4 comments · May be fixed by #1020
Assignees
Labels
single cell Issues related to single-cell data support

Comments

@arteymix
Copy link
Member

arteymix commented Feb 12, 2024

It looks like there's three data formats commonly used out there:

  • SeuratDisk, and HDF5-based storage for Seurat
  • AnnData on-disk storage, another HDF5-based storage
  • MEX a 10X format with the .mtx extension, does not include samples/factors, so an additional user-supplied mapping would be necessary
  • Loom which is another HDF5-based format
@arteymix
Copy link
Member Author

I'm currently implementing the import for AnnData HDF5. It will be built on top of HDF5 JNI API and reusable for Seurat.

MTX is relatively easy to do since it's just a tabular format that we can parse with Apache Commons CSV.

@arteymix arteymix linked a pull request Feb 12, 2024 that will close this issue
6 tasks
@arteymix arteymix added the single cell Issues related to single-cell data support label Feb 12, 2024
@arteymix
Copy link
Member Author

Another format to consider are aggregated MEX which are basically multiple samples combined in a single matrix.

https://www.10xgenomics.com/support/software/cell-ranger/latest/analysis/running-pipelines/cr-3p-aggr

@arteymix
Copy link
Member Author

arteymix commented Feb 14, 2024

Sample associations for MEX formats can generally be determined by the naming scheme of the submitted files which get prefixed by GSM IDs.

I haven't looked yet at aggregated ones, but I suspect we might not have that benefit.

Those nitty gritty details should be dealt with on the GEO loader.

@arteymix
Copy link
Member Author

This is done. We'll support MEX, AnnData and Seurat Disk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
single cell Issues related to single-cell data support
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants